carpentries / pointers

Open Datasets for (Data Science) Education
https://carpentries.github.io/pointers/
Other
2 stars 0 forks source link

Adding "weather prediction dataset" #4

Closed florian-huber closed 1 year ago

florian-huber commented 1 year ago

Thank you for your interest in submitting a dataset to Pointers! Please respond to the prompts below to complete your submission, referring to the Inclusion Requirements & Guidance section of this repository's README1 as you do so. Check boxes by adding an 'x' between the square brackets at the start of each point (i.e. [x]), or submit the issue and check off the boxes afterwards.


  1. What is the name of the dataset record in Zenodo? Weather prediction dataset

  2. What is the URL of the dataset record in Zenodo? https://zenodo.org/record/7053722

  3. What is the format of the dataset? E.g. plain text files, database files, image files, etc. .csv files with tabular data

  4. Check the boxes to confirm that the dataset record includes:

    • [x] a README file describing the dataset
    • [ ] a license file or folder of license files
  5. If the dataset record includes supplementary information such as teaching materials or code, please briefly describe these.

florian-huber commented 1 year ago

Currently the zenodo entry has a field "licence (for files)" pointing to this: https://creativecommons.org/licenses/by/4.0/legalcode There is, however, no licence file among the uploaded files.

tobyhodges commented 1 year ago

Thanks so much for submitting this dataset, @florian-huber. I really enjoyed looking through the Zenodo record and the accompanying notebooks you provided. I particularly liked the metadata.txt file that is included with the dataset.

Below are some notes from reviewing the record. I have emphasised where there are changes I would like you to make before we accept the record to the Pointers collection.

Organisation

The data is well-organised in plain text files, and the size and complexity of the data well balanced.

Documentation

Between the metadata.txt file and the Zenodo entry description, the data is generally very well documented.

Variables within the files are described (either self-documented in column names, or in metadata.txt), but I did not find a description of what information is contained in each of the data files. Please add a short description of each data file.

Additionally, metadata.txt includes an excellent list of variables and their units, but I was unclear on why these variables were listed with two-letter codes ("CC", "DD", etc) - is this included because those two-letter codes are present in the original data from ECA&D?

I would recommend to also include a README file in the Zenodo record, to make it easier for people to refer to the documentation when working with a local version of the dataset after downloading it.

And a minor point: in metadata.txt, you include a numbered reference in the description (emphasised in the blockquote below), but that does not seem to point to a reference anywhere in the file:

Data collection selection and processing

The initial meteorological data was retrieved from ECA&D [1] a project that makes available daily observations at meteorological stations throughout Europe and the Mediterranean.

License(s)

The dataset and accompanying material is all appropriately licensed. However, I would recommend including a license file in the record itself, so that anyone who downloads it and shares it with others will be providing the license information alongside the data.

Supplementary Materials

The links you provide to the GitHub repository and workshop talk (which seems to have a bug) are excellent and much appreciated. I really like how you have kept these three resources separate but clearly linked between them.

My only question, for @vantuyls, is whether we consider .ipynb files sufficiently "open"/easy-to-read to fulfill our requirements? (I would argue "yes.")

vantuyls commented 1 year ago

the above review looks great, thanks @tobyhodges . I agree that .ipynb files are just fine.

tobyhodges commented 1 year ago

Thanks @vantuyls.

@florian-huber based on Steve's response, these are the things we would like you to do before we accept the dataset into the Pointers Zenodo community:

  1. Add a short description of each datafile into the documentation. This could be in metadata.txt, in the Zenodo record description, in a README file (see below), or any combination of these that you think makes most sense. My recommendation would be to add a README that includes this info, and also to include it in the Zenodo record description.
  2. Explain the origin and relevance of the two-letter codes for variables in metadata.txt.
  3. Add the missing reference to metadata.txt.
  4. (optional) add README and LICENSE files (or a LICENSES directory containing multiple license files) to the Zenodo record to facilitate secondary sharing and reuse of the data.
florian-huber commented 1 year ago

@vantuyls and @tobyhodges Sorry for taking so long! I have now updated the GitHub repo and the zenodo entry according to your suggestions (expect adding a Liscense file to zenodo).

Let me know if something else is missing or needs to be edited. New zenodo link: https://doi.org/10.5281/zenodo.7525955

tobyhodges commented 1 year ago

Thank you @florian-huber for following up on our reviews. The changes look good to me and, if @vantuyls agrees, I will happily accept the Zenodo entry into the Pointers collection and close this issue.

vantuyls commented 1 year ago

@florian-huber thanks for the updates - they look great!

@tobyhodges i don't think i have permission on the Zenodo community to accept an upload. if you could do so and close issue, that'd be great.

tobyhodges commented 1 year ago

done ✅ congratulations @florian-huber and thanks for working with us on this!

florian-huber commented 1 year ago

Great! 🚀 Thanks a lot @tobyhodges and @vantuyls for taking care of this!