Open luckyhug opened 2 years ago
Hi @luckyhug,
No I think there's not much way to effectively use that much data in autosklearn natively. My only suggestion would be to run auto-sklearn on a subsample of that data and use show_models()
to inform which models and hyperparameters to use for the next step of your pipeline in terms of which configurations to fit and handle the incremental learning and partial fitting in a custom manner.
Best, Eddie
@luckyhug did you consider down casting numerical values from e.g. float64 to float16 already? This could reduce your memory consumption by a factor of 4.
@jonaslandsgesell good idea! We automatically do that already if the dataset is too large. We also automatically subsample the data if it's too large to fit in memory, but this means all of that original data can not be used then.
Does auto-sklearn support incremental learning or partial_fit?
when the dataset is too big for the RAM, (About 230+GB, although I can store it in a list, there's not enough memory to convert the list to an np array)
Is there any advice or examples on dealing with this dataset?
Thank you very much!