ICESAT-2HackWeek / coastal_topobathy

Team project aimed at extracting sections of tracklines, and identifying bathymetry and coastal morphology - can be extended to other questions!
https://icesat-2hackweek.github.io/coastal_topobathy/
4 stars 7 forks source link

A few more clean-up re-organization items #11

Open eeholmes opened 2 years ago

eeholmes commented 2 years ago

Since contributors is the sandbox and DataDownload was/is a group sandbox, maybe we move it to contributors for now? I have some things in DataDownload to clean up still but not going to do that immediately.

The following are types of data that would be good to have in a shared space. None of these are huge.

iled commented 2 years ago
  • Move DataDownload into contributors?

Since contributors is the sandbox and DataDownload was/is a group sandbox, maybe we move it to contributors for now? I have some things in DataDownload to clean up still but not going to do that immediately.

I think so, too.

  • Where should we have shared data?

As suggested in #6, I think a ./data directory would be a fine place for that purpose.

The following are types of data that would be good to have in a shared space. None of these are huge.

  • Shared geojson files for different regions of interest.

The example given in #6 was exactly on ROIs.

  • ATL03 data (in some not too huge but easy to read-in format) for good test cases. Those working on photon classification algorithms just need a set of ATL03 data to work on and would be good to have consistent set test cases. I don't know what's a good format. I think it'll be a geopandas dataframe? that we want to save. Or at least a pandas dataframe.

I think it makes sense to have a common place for things that are likely to be reused multiple times, and subsets for test cases are such an example. Other pieces of data specific to some example could reside for example in ./examples/data.

I am also not sure about the best format. Perhaps something based on hdf? I don't know what's more typical in the ICESat world. Note that (geo)pandas dataframes are not file formats, those are data structures that only exist when running code. I guess they could be dumped in binary form—e.g., to a pickle—but I think that would not be very portable.

iled commented 2 years ago

I can do the refactoring of DataDownload and linked notebooks later this week or in the weekend.