Add function to fetch and load Bedmap2

fatiando / rockhound

NOTICE: This library is no longer being developed. Use Ensaio instead (https://www.fatiando.org/ensaio). -- Download geophysical models/datasets and load them in Python

BSD 3-Clause "New" or "Revised" License

34 stars 15 forks source link

Add function to fetch and load Bedmap2 #11

Closed santisoler closed 5 years ago

santisoler commented 5 years ago

Bedmap2 is a suite of gridded products describing surface elevation, ice-thickness, the sea ﬂoor and subglacial bed elevation of the Antarctic south of 60°S. The Bedmap2 datasets are downloaded from the British Antarctic Survey and loaded as xarray.Dataset objects.

Reminders

[ ] Run make format and make check to make sure the code follows the style guide.
[ ] Add tests for new features or tests that would have caught the bug that you're fixing.
[ ] Add new public functions/methods/classes to doc/api/index.rst.
[ ] Write detailed docstrings for all functions/methods.
[ ] If adding new functionality, add an example to the docstring, gallery, and/or tutorials.

santisoler commented 5 years ago

@leouieda would you mind checking this code?

I have one minor doubt: Should import the DATASETS list in the __init__.py in order to make it accesible to users? So, if one want to load every dataset could do something like:

from rockhound import fetch_bedmap2, BEDMAP2_DATASETS

grid = fetch_bedmap2(BEDMAP2_DATASETS)

Note: loading every dataset could eat a lot of memory!

santisoler commented 5 years ago

@santisoler looks great! Just a few suggestions for making the code a bit cleaner.

I don't think we should expose the DATASETS variable. That's a very narrow use case. What we could do is allow datasets="all" to be a valid input.

I've been messing around with the datasets="all" argument with the idea of making it the default value, but due to the high memory requirement I dropped it. But leaving the datasets="all" as an option would be good. I'll add it.

But we can't load some of the grids into the same Dataset because they have different coordinates and shapes, right?

Thanks to xarray we could get grids with different shapes on the same Dataset with the xarray.merge() function. The coordinates are spanned to include the coordinates of every DataArray and the values are filled with nans if no data is available. Don't know if we should only merge DataArrays with the same shape or leave the code as it is now.

santisoler commented 5 years ago

@leouieda I'll apply the requested changes tomorrow. Thanks for the review!

leouieda commented 5 years ago

Don't know if we should only merge DataArrays with the same shape or leave the code as it is now.

That's a tough one. Maybe we can leave as is but include a warning the docstring that you'll get different things if you ask for only the uncertainty grids or for all of them.

leouieda commented 5 years ago

due to the high memory requirement I dropped it

This should definitely be a warning in the docstring. We don't want people crashing their computers. And it should definitely not be the default :smiley:

santisoler commented 5 years ago

@leouieda I think the code and its docstring look better now. I've noticed that the only the dataset thickness_uncertainty_5km is defined on a different set of points than the rest, so we don't have to bother with the lakemask_vostok one. That's why the warning only reference the former as source of problems.

Do you like the hard-coded metadata as it is?

leouieda commented 5 years ago

:+1: I really like the changes and the docstring! Good job!

What do you think about removing the all option? It feels like something that will be abused and cause system crashes often. Plus, we can't test it on the CIs.

santisoler commented 5 years ago

@leouieda Thanks for the reviews.

I think I've added and removed the all option a hundred times, what leads me to think we are not entirely sure if it worth the risk of crashing local systems and/or Travis (is it possible?). We definitely should remove it but leave the option to pass the datasets are strings (instead of putting a single dataset inside a list). In fact I don't imagine a case in which someone would need the entire set.

I'll remove it from the code, the docstring and the test functions. Then if no CI fails we could merge.

leouieda commented 5 years ago

:+1: thanks, Santi!

leouieda commented 5 years ago

We still need a gallery plot for this. Given the size of the grids, probably only plot a subsection, like we did for the ETOPO1 grid.

santisoler commented 5 years ago

Thanks for the reviews @leouieda! I can get into the gallery plot this week. I'll open a PR when ready.