A lot of the XBT code feels quite slow. I don't have a benchmark so I don't know whther it actually is slow, but the parts that are slow seem simpler than the parts that are fast e.g. algorithm training is quicker than running iMeta, so some of the slow parts can conceivably be expedited. Things that should be looked at to improve performance:
loading data
running iMeta algorithm
calculating splits
outputting data
In terms of data, it might be that another data format would be quicker e.g. parquet .
Some things are just inherently slow, but wiaiting times could be reduced by parallel processing. Area which could be run in parallel include:
hyperparameter tuning - parallelism using joblib/dask
netcdf to csv conversion - this seems like a good task for dask based parallelism
metric calculation - splitting into years for dask calculations
A starting point is to measure performance on different platforms for each section of the algorithm. Then compare performance for single for parallel execution.
A lot of the XBT code feels quite slow. I don't have a benchmark so I don't know whther it actually is slow, but the parts that are slow seem simpler than the parts that are fast e.g. algorithm training is quicker than running iMeta, so some of the slow parts can conceivably be expedited. Things that should be looked at to improve performance:
In terms of data, it might be that another data format would be quicker e.g. parquet .
Some things are just inherently slow, but wiaiting times could be reduced by parallel processing. Area which could be run in parallel include:
A starting point is to measure performance on different platforms for each section of the algorithm. Then compare performance for single for parallel execution.