ORAC-CC / orac

Optimal Retrieval of Aerosol and Cloud
GNU General Public License v3.0
30 stars 20 forks source link

Provide data package for running the regression test #101

Open pdebuyl opened 1 month ago

pdebuyl commented 1 month ago

It would be extremely nice if a sample of data to run ORAC would be provided.

Getting the correct set of input files (emissivity, etc) for a specific scan is not that trivial and it would allow the user to focus on making sure that they have compiled and configured the software properly.

The data for the regression.py program + maybe a few examples (including a GEO) would be nice. I am willing to provide a full configuration for SEVIRI.

It is possible to upload a few GB to a zenodo entry, which could be pointed to in the ORAC documentation.

adamcpovey commented 1 month ago

I agree that would be nice. Legally, we can't distribute someone else's data. I already have a folder that contains all of the files necessary to run regression.py so if someone else wants to work out the licences, I can package it.

pdebuyl commented 1 month ago

For the purpose of running tests of ORAC, you could probably have the authorization to provide a copy of one dataset (I mean, one granule/scan/etc). @simonrp84 did this for his PyCoxMunk implementation and provides one SEVIRI file here https://zenodo.org/records/7886737

A lot of the american (NASA/NOAA) data is public domain, for instance. As a EUMETSAT member (at work I mean), I can contact them to ask for the authorization.

NOAA AVHRR is under the "Level-1 and Atmosphere Archive & Distribution System" DAAC: can't find the direct download link but the policy is here: https://modaps.modaps.eosdis.nasa.gov/services/faq/LAADS_Data-Use_Citation_Policies.pdf

Atmospheric Science Data Center requests citation but does not limit distrbution: https://asdc.larc.nasa.gov/citing-data

CAMEL is under the "land processes" DAAC: https://lpdaac.usgs.gov/products/cam5k30cfv003/ -- no restriction: https://lpdaac.usgs.gov/data/data-citation-and-policies/

simonrp84 commented 1 month ago

Under article 10.2 of the license agreement I believe that you and I are allowed to redistribute any original numerical SEVIRI data, Pierre. Adam and the others are only allowed to distribute the "core" SEVIRI data, which is the hourly L1.5 data.

The NASA ancillary data used by ORAC is, to my knowledge, freely redistributable. As is the ERA5 data.