Ouranosinc / xclim-testdata

NetCDF example files used for the testing suite of Xclim
MIT License
2 stars 2 forks source link

Add datasets with tasmax / tasmin #28

Open coxipi opened 8 months ago

coxipi commented 8 months ago

It's been a couple times I would have liked to have datasets (at least two simulations) with tasmax, tasmin variables. Something like datasets in sdba (which have tasmax, pr).

Should I add new datasets with tasmax/tasmin, or update existing datasets in sdba with a 3rd variable?

Zeitsperre commented 8 months ago

How breaking of a change would it be to the existing SDBA tests? Would you be willing to update all associated tests?

coxipi commented 8 months ago

As far as tests are concerned, I think only test_stack_variables would need to be changed.

Also, sdba{_advanced}.ipynb assume we only work with pr, tasmax, in this case I would propose to simply drop tasmin from the start and keep the same structure.

coxipi commented 8 months ago

It's not really related to tests, so maybe I should avoid complicating the structure. I was just using this to test methods for R&D projects, so it's maybe not a real need for xclim_testdata.

coxipi commented 8 months ago

I ended up downloading a few lon/lat points from datasets on the server. I would be happy to add tasmin datasets in the test datasets if we judge we should have this, but I'm not sure if this is necessary.

When checking if adding tasmin would pose problems with xclim, I realized that the datasets in xclim-testdata are intended for tests (duh), so my use case of trying methods for new bias corrections is not what we intend to cover with xclim-testdata (correct?). For instance, it would also have been useful for me to have access to radiation datasets, but I don't think we want this if we are to keep xclim_testdata minimal.

So, if you guys agree, feel free to close the issue, if not I can add the tasmin datasets.

aulemahal commented 8 months ago

Indeed, the point of this repo is mainly for testing. But I use it too for development as it gives an easy access to well-formatted data.

As for adding variables to the SDBA test datasets, I think it would make sense once we have multivariate methods to test. Currently, tasmin would be no use because in an univariate context you could just do tasmin = ds.tasmax.rename('tasmin'). But in a mutlivariate context, it would be interesting to have "real-life" data with all the noisy correlation and other perks.

Zeitsperre commented 8 months ago

I see this repo as something that serves both the unit tests and the performance tests of xclim (e.g. xclim-benchmark). If your changes to the testing data can lead to more rigorous testing in either or both of those aspects, I see no issues with you moving forward.

If you see the additional variables as only useful for benchmarks, you could always add a new folder here and add your modified test data into there. By default, xclim's testing code only downloads the files it needs to run its tests (set by a list of file locations in xclim/testing/helpers.py).

Unless you're adding a significant amount of data (>3 MiB would be significant), my vote is to go forwards with your idea.

coxipi commented 8 months ago

I think it would make sense once we have multivariate methods to test.

We do some multivariate tests in sdba.ipynb, but for now tasmax / pr is sufficient for our needs.

serves both the unit tests and the performance tests of xclim

This would be neither of those currently.

Unless you're adding a significant amount of data (>3 MiB would be significant)

Ok, yes I was thinking something the same size as the current sdba datasets. I'll go forward with this when I have a down time.