TUDelftGeodesy / stmtools

Xarray extension for Space-Time Matrix
https://tudelftgeodesy.github.io/stmtools/
Apache License 2.0
6 stars 0 forks source link

57 add documentation #71

Closed vanlankveldthijs closed 4 months ago

vanlankveldthijs commented 5 months ago

To resolve issue #57

There are a few remaining issues to resolve.

1) The pull request has 2 example notebooks: one with and one without storing and reloading the ordered data before performing the timing tests. To be determined which is better. This one should be called demo_order_stm.ipynb and the other removed.

2) It seems like stmat.copy does not have the desired result: when I try to run STM operations on the cloned data, there seems to be some parts missing (I think it was the whole .stm part). Solved by loading the whole dataset twice, but that is a bit of an ugly fix.

3) It seems that when using small chunk sizes, storing to zarr fails. Solved by storing using 1000 size chunks and rechunking after loading. This is a bit ugly.

4) Possible issue with storing the ordered data to zarr when this needs to overwrite existing data. I think shutil.retree also sometimes gives issues (maybe when the is no directory).

5) Timing tests using the prerequisite small chunk size (500 in this case) become very slow. Don't know whether we can really fix this though. Currently running the final timing tests on both notebooks. To be committed when done.

rogerkuou commented 5 months ago

Thanks @vanlankveldthijs for the implementation. Following our discussion, the next steps can be:

  1. Demonstrate the performance enhancement from re-ordering in a separate md page.
  2. Elaborate on the factors affecting the re-ordering effect: 1) homogeneity of point distribution; 2) spatial spreadness of points. These two factors can influence the choice of chunksize and spatial closeness of points in the same chunk
  3. Only do performance comparision on subset, which should be enough
  4. Use %%timeit for a robust profiling
  5. Still keep an example notebook for the illustration of re-order operation (you can decide either include this in the existing notebooks or make a new one)

Attaching here two example notebooks I made for comparison: notebooks.zip

vanlankveldthijs commented 5 months ago

Consolidated demo notebooks into one.

Fixed stm.copy issue (should've been stm.copy())

Changed demo notebook to only test subset operation and to test this on large and small chunks (relative to dataset) to see impact.

vanlankveldthijs commented 5 months ago

Fixed pytests by checking for existence of time in chunksizes.

Please check whether implementation could be nicer (better python).

rogerkuou commented 4 months ago

Hi @vanlankveldthijs, thanks for the nice work and sorry for taking so long to review. I quite like the new notebooks. I think it is good for merging now.

As I have mentioned in the previous comment, the ruff workflow should have been fixed by Sarah in #66.

Two warnings in the notebook are documented in #74 and #73. They are not relevant for this PR and can be fixed later