alteryx / Automated-Manual-Comparison

Automated vs Manual Feature Engineering Comparison. Implemented using Featuretools.
https://towardsdatascience.com/why-automated-feature-engineering-will-change-the-way-you-do-machine-learning-5c15bf188b96
BSD 3-Clause "New" or "Revised" License
327 stars 150 forks source link

Performance problems #1

Closed Gitii closed 6 years ago

Gitii commented 6 years ago

Hi,

thanks for your article. Automated Feature Engineering is very promising. I am running the Loan Repayment script right now to compare it with my own engineered features. I am very curious about the results.

What is the recommended horse power to compute the result on one day (like mentioned in the article)? Elapsed: 18:50:30 | Remaining: 22358:53:57 | Progress: 0%| | Calculated: 3/3563 chunks

The ft.py uses one job by default. Any other value but 1 crashes the script. I am using a r4.2xlarge aws ec2 instance. But with one job it cannot utilize more than one core. Nevertheless even with all eight cores, it would still take weeks.

Can you recommend some specs to speed this up?

Best regards

WillKoehrsen commented 6 years ago

With the Featuretools on Dask notebook implementation, it took about 3 hours to generate the feature matrix using a MacBook laptop with 16 GB of RAM. I'd recommend taking the Dask route instead of doing the entire computation at once.

The Dask implementation should run within a few hours on a personal laptop. It's a great example of the benefits of parallel computation!