Open timothydmorton opened 4 years ago
@brendancol has removed the .computes from the dataframe()
@philippjfr is working of performance issues. He found and fixed a major one yesterday.
Great! Let me know when I can test this on lsst-dev, and what package version(s) I need to do so. I have some time to work today, so happy to help jump on this.
I pinged @philippjfr earlier today. He has some in process optimizations so he said tomorrow would be a better time to test on lsst-dev.
Yes, hoping we can improve performance by another factor of 2 at minimum by avoiding the repeated range calculations.
@timothydmorton Panel 0.9.4 is released and fixes the issue noted above. Brendan has removed the .computes() in a branch called persist-update
if you did want to try things today.
When using the partitioned data, even on a relatively large dataset, the data loading with kartothek seems to be proceeding excellently. However, there are still other substantial lags that cause ~10s of seconds delay when adding metrics. Is this coming from the visit summary computations? If so, we need to mitigate this; if not, we need to find what else is slowing us down.
We may need to slightly re-imagine the visit plot (more on this later), or maybe this would get fixed if we can increase the number dask workers? I'm not convinced that more workers will help because at present it seems like dask isn't actually doing anything while this waiting is happening.