feature-engine / feature_engine

Feature engineering package with sklearn like functionality
https://feature-engine.trainindata.com/
BSD 3-Clause "New" or "Revised" License
1.85k stars 308 forks source link

Imputation and data generation methods request #302

Open papachristoumarios opened 3 years ago

papachristoumarios commented 3 years ago

Is your feature request related to a problem? Please describe.

Methods for data imputation and data generation.

Describe the solution you'd like

  1. It would be a good idea to have a data imputation method like the Amelia package in R (https://cran.r-project.org/web/packages/Amelia/index.html) together with existing imputation methods.
  2. Regarding artificial data generation methods, I believe that feature-engine will benefit from copula methods such as https://sdv.dev/Copulas/ (and its related works).
T-rect commented 3 years ago

Hi, ok I understand, and will do as soon as possible. Thanks for the reply

Pada tanggal Sen, 30 Agu 2021 17.44, Soledad Galli @.***> menulis:

Hi @T-rect https://github.com/T-rect could you please delete your comment from this issue and create a new issue with your question? then i can answer there separately.

Thank you

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/feature-engine/feature_engine/issues/302#issuecomment-908238293, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQ4BTE6J6H3YEEVQ7TY6H73T7NOHFANCNFSM5CYVOX7A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

solegalli commented 3 years ago

Thank you @papachristoumarios

At the moment, we don't have methods for artificial data generation in the roadmap for Feature-engine.

Re: the imputation method, after a quick look, am I alright to say that these are methods for time series data?

At the moment, Feature-engine takes care of tabular (cross-sectional data). We do have a module on time series in the roadmap, it is not the top priority, but it is in the horizon for consideration.

Would appreciate a bit more info on the techniques suggested, a small explanation for the lay audience, without having to read the docs would help us understand if it is in or out of scope and where in the roadmap this should be placed.

Thank you