NOAA-OWP / DMOD

Distributed Model on Demand infrastructure for OWP's Model as a Service
Other
7 stars 15 forks source link

Automatically download hydrofabric datasets #534

Open robertbartel opened 6 months ago

robertbartel commented 6 months ago

Add capabilities for automatically downloading publicly available hydrofabric datasets.

aaraney commented 5 months ago

One thing to note, if we are pulling HFs from https://www.lynker-spatial.com/, we will need to included the license along with those datasets.

christophertubbs commented 5 months ago

How would you include the license if you're downloading it arbitrarily? (i.e. user points the app at an URL and it just so happens to be from lynker)

christophertubbs commented 5 months ago

Also, how does this jive with what I'm assuming would/could be hydrofabrics generated in/around dmod?

robertbartel commented 5 months ago

I suggest we add a NOTICE file in the root directory for the repo and start to put dependency license details here. I went ahead and created #559 for that, though feel free to either object or suggest alternatives if you have a better idea.

Also, I don't think it will hurt it things are done over many PRs or if the file is actually created as part of a PR for this; i.e., I opened #559 and put it as In progress, but we can just worry about creating the file for now and move #559 issue back to Todo until we are ready to tackle the rest.

aaraney commented 5 months ago

How would you include the license if you're downloading it arbitrarily? (i.e. user points the app at an URL and it just so happens to be from lynker)

My initial thoughts are that we should take a good stewards approach. I don't think we will be able to come up with a general all inclusive solution to this, but I think having functionality to store and display this kind of information is better than just saying the problem is unsolvable from a general stance.

I've not kept my finger on the pulse of how this kind of metadata is being stored in scientific data repositories (e.g. zenodo, nature data, hydroshare), but I would recommend that we reach out to our colleagues that work more frequently with similar datasets to the hydrofabric and / or use the aforementioned data repositories for advice or general guidance. We might look to these data repositories to see how they are capturing license metadata for reference.

christophertubbs commented 5 months ago

Would we try to store a notice per download or per source? i.e. the copyright html from lynker-spatial the first time it's downloaded or everytime something from it is downloaded? Also, would we want to tie datasets to licenses? The lynker one is tied to the ODbL.

Also, is it fair to assume that a user 'downloading a dataset' is just spanish for moving it to the object store?

aaraney commented 5 months ago

i.e. the copyright html from lynker-spatial the first time it's downloaded or everytime something from it is downloaded?

I don't know that we want either. I think we just want to display the licenses if they exist and provide a way to download them. I think we will need a new data format to support this, @robertbartel.

Also, would we want to tie datasets to licenses?

Yes, but more specifically we want to tie a dataset item to a license. In the current state of datasets in not sure how feasible this will be, so we may need to go with the more broad license to dataset relationship you asked about, @christophertubbs.

aaraney commented 5 months ago

Also, is it fair to assume that a user 'downloading a dataset' is just spanish for moving it to the object store?

Pretty much. Creating the required dataset metadata and uploading that and the dataset to the object store.

christophertubbs commented 5 months ago

Do we have other download logic/services/etc floating about?

robertbartel commented 4 months ago

This may actually be superseded by #137. At minimum, I think it probably make more sense to prioritize that issue.