IPCC-WG1 / Atlas

Repository supporting the implementation of FAIR principles in the IPCC-WGI Atlas
Other
226 stars 51 forks source link

ATLAS Review- a consolidated review of several minor elements #25

Closed aradhakrishnanGFDL closed 3 years ago

aradhakrishnanGFDL commented 3 years ago

Firstly, this is such a great effort. The portal has a nice look and feel with a plethora of information one can access without having to even download the datasets or write a script. Great work. Just some knit picking in the best interest of Atlas, can be found below. Thanks for the oppurtunity.

  1. R is great and the use of Jupyter notebooks is fantastic too. The scope could be expanded by including languages like Python. This would also fit under the “Reusability” and “Interoperability” aspects in FAIR. This also sets a future vision to containerize the scripts to promote reusability.

  2. The csvs in https://github.com/IPCC-WG1/Atlas/tree/master/ATLAS-inventory are great ways to establish some level of provenance. The version identifier is extremely important, nicely captured for CMIP6. For CMIP5 though, it’s not specified in the .csv. I believe version strings did exist for CMIP5 as well. The CMIP6 siconc inventory from ESGF also seems to be missing version identifiers https://github.com/IPCC-WG1/Atlas/tree/master/ESGF-inventory/CMIP6.

  3. Would be nice to use CMIP6 data DOIs in ATLAS emphasizing FAIR principles and giving credit to the data producers.

For the datasets used, it’s nice to see the observation datasets listed. It might help to specify versions for all of the datasets, provide a data DOI or at least a pointer to the data in the public domain.

  1. The use of Jupyter notebooks are promising. https://github.com/IPCC-WG1/Atlas/blob/master/notebooks/regional_delta_changes.ipynb How extensible and collaborative the framework is?

The notebook examples themselves would be great to have more markup documentation, especially the sections where the data is ingested to let users know how it can be expanded. Also, what does one need to do in order to reproduce the notebook figures? This question should be addresses in simple words to promote reusability.

  1. Metaclip sounds like a great utility. But I am unable to understand how it's connected to Atlas to see the provenance information in the scripts or figures. Some additional info regarding this would be great for users to leverage Metaclip.
  2. Design and Usability: After making selections on the atlas, the user could be prompted to say how the maps are updated, which is just hovering outside in this case and not expecting a submit action. This is cool, but can be a bit lost without having to see the updated map.

It will be nice tooltips while hovering over different locations on the atlas which then lists the products such as time series, etc options. These type of actions could also be documented in the README or user documentation links.

  1. The maps downloaded from the atlas portal do not carry enough metadata. This could potentially be used in presentations and papers and it will be great to see the provenance info on the maps, unless I am missing it. Once again,the data DOIs or acknowledgements are essential to make sure users continue to track provenance.

  2. Possible typos in Help & Instructions and missing links. In particular, the SOD incldues Climatic Impact Drivers as defined in Chapter 12 SOD (in is missing) Climatic Impact Drivers - map legend, confidence. In direction of change About page: The inventory of model runs, scenarios and variables used in the SOD is available at the IPCC WG1 Atlas repository . But, the link https://github.com/IPCC-WG1/Atlas/tree/master/AtlasHub-inventory is not a valid reference. Perhaps you’re referring to https://github.com/IPCC-WG1/Atlas/tree/master/ATLAS-inventory

  3. I think the purpose of Atlas and the maps shown in atlas could be documented to be more specific For example, if "key figures" are shown in Atlas, how are the key figures determined? Are they from the chapter figures? If yes, how does this effort compare to the ESMValTool effort.

  4. Consider renaming master branch to main branch. https://github.com/github/renaming

  5. In terms of findability and accessibility, one could consider inventorying the scripts used. Sorry if I overlooked this, the README is a great place to start.

  6. In terms of regridding, the regridding algorithms and source code could be references. Basically, at whatever possible granularity, the tools that encompass Atlas should be archived, along with a DOI, especially since DOIs can support different versions should that be needed.

  7. Dad data retraction elements, pointers to errata pages (if any) for the datasets used also should be documented-- if there be a situation like that. I think this is a general problem that has no direct solution. Your csv listing is a great start. Some basic documentation or prompts for users to check data versions before running the notebooks, say few years from now on a data that later has a newer version - may be valuable. DOIs can also be very helpful here to trace back the steps.

  8. For future work, it will be great to see more generalized data cataloguing utilities that can readily help with analysis. Some python based utilities exist today. To help users work with big data, example notebooks for use with xarray, dask etc exploring on and off the cloud capabilities would be cool. Overall, this is a great initiative.

jesusff commented 3 years ago

Thank you for your thorough review. I'll try to address your comments grouping them by topics. I'll keep a task list in this first comment to make sure all are covered.

jesusff commented 3 years ago
  1. R is great and the use of Jupyter notebooks is fantastic too. The scope could be expanded by including languages like Python. This would also fit under the “Reusability” and “Interoperability” aspects in FAIR. This also sets a future vision to containerize the scripts to promote reusability.

  2. The use of Jupyter notebooks are promising. https://github.com/IPCC-WG1/Atlas/blob/master/notebooks/regional_delta_changes.ipynb How extensible and collaborative the framework is?

The notebook examples themselves would be great to have more markup documentation, especially the sections where the data is ingested to let users know how it can be expanded. Also, what does one need to do in order to reproduce the notebook figures? This question should be addresses in simple words to promote reusability.

  1. For future work, it will be great to see more generalized data cataloguing utilities that can readily help with analysis. Some python based utilities exist today. To help users work with big data, example notebooks for use with xarray, dask etc exploring on and off the cloud capabilities would be cool. Overall, this is a great initiative.

Notebooks have been thoroughly reviewed to expand the explanations and provide alternative options that the user could take. There is usually a Parameter settings section for this purpose. All figures in notebooks can be reproduced by executing the available code. Data sources are either locally available in the repository or directly accessible through the network, or both. We focused on the R language, which is quite extended in climate science, with few examples in Python. We agree that this alternative language would enhance reusability and interoperability, but we leave the extension of the currently available notebooks to other use cases and to other programming languages to the collaboration of the community, using the standard tools provided by Git and GitHub.

jesusff commented 3 years ago
  1. The csvs in https://github.com/IPCC-WG1/Atlas/tree/master/ATLAS-inventory are great ways to establish some level of provenance. The version identifier is extremely important, nicely captured for CMIP6. For CMIP5 though, it’s not specified in the .csv. I believe version strings did exist for CMIP5 as well. The CMIP6 siconc inventory from ESGF also seems to be missing version identifiers https://github.com/IPCC-WG1/Atlas/tree/master/ESGF-inventory/CMIP6.

CMIP5 version strings have been added (8425b05). The ESGF inventory was an internal intermediate product updating periodically which has been removed for the final, frozen version of this repository.

jesusff commented 3 years ago
  1. Metaclip sounds like a great utility. But I am unable to understand how it's connected to Atlas to see the provenance information in the scripts or figures. Some additional info regarding this would be great for users to leverage Metaclip.

A substantial effort has been done in order to deliver all the Interactive Atlas products with a well-documented provenance description. METACLIP makes an emphasis in the delivery of ‘final products’ (understood as any piece of information that is stored in a file, such as a plot or a map) with a full semantic description of its origin and meaning. METACLIP is an Atlas-independent development that has been adopted for the IA products, including some specific adaptations of the tool to the specific Atlas requirements (a dedicated vocabulary and other internal developments). As a result, the associated METACLIP developments are not currently included in the IPCC WGI Atlas GitHub repository (although these are publicly available in https://github.com/metaclip). METACLIP representations are only associated to certain IA final products (e.g. delta maps), and not to scripts. The METACLIP provenance representation can be accessed directly through the METACLIP button from the Interactive Atlas main screen.

  1. The maps downloaded from the atlas portal do not carry enough metadata. This could potentially be used in presentations and papers and it will be great to see the provenance info on the maps, unless I am missing it. Once again,the data DOIs or acknowledgements are essential to make sure users continue to track provenance.

PNG maps downloaded from the Interactive Atlas carry all provenance information using the METACLIP representation, including DOIs. It can be accessed e.g. by dropping the image in http://www.metaclip.org

jesusff commented 3 years ago
  1. Would be nice to use CMIP6 data DOIs in ATLAS emphasizing FAIR principles and giving credit to the data producers.

For the datasets used, it’s nice to see the observation datasets listed. It might help to specify versions for all of the datasets, provide a data DOI or at least a pointer to the data in the public domain.

Yes, since the review, observations have been added and also versions for all datasets and pointers (handles and ESGF search URLs) to the datasets, which are available for each variable and dataset.

jesusff commented 3 years ago
Design and Usability:
  1. After making selections on the atlas, the user could be prompted to say how the maps are updated, which is just hovering outside in this case and not expecting a submit action. This is cool, but can be a bit lost without having to see the updated map.

It will be nice tooltips while hovering over different locations on the atlas which then lists the products such as time series, etc options. These type of actions could also be documented in the README or user documentation links.

  1. Possible typos in Help & Instructions and missing links. In particular, the SOD incldues Climatic Impact Drivers as defined in Chapter 12 SOD (in is missing) Climatic Impact Drivers - map legend, confidence. In direction of change

All suggestions have been passed to the Interactive Atlas development team and they have been implemented in the test version which will be released soon.

jesusff commented 3 years ago
  1. In terms of findability and accessibility, one could consider inventorying the scripts used. Sorry if I overlooked this, the README is a great place to start.

In order to summarize all scripts in a given folder, they have been listed in the README, regardless of the depth in the directory tree (see e.g. https://github.com/IPCC-WG1/Atlas/tree/08fa3c3/datasets-interactive-atlas). For notebooks, a table summarizes the availability in different languages (https://github.com/IPCC-WG1/Atlas/tree/08fa3c3/datasets-interactive-atlas)

jesusff commented 3 years ago
  1. Consider renaming master branch to main branch. https://github.com/github/renaming

Thanks for the suggestion. The master branch has been deleted and, when final, a "main" branch will point to the final, frozen repository state and will be made the default branch.

jesusff commented 3 years ago
  1. Dad data retraction elements, pointers to errata pages (if any) for the datasets used also should be documented-- if there be a situation like that. I think this is a general problem that has no direct solution. Your csv listing is a great start. Some basic documentation or prompts for users to check data versions before running the notebooks, say few years from now on a data that later has a newer version - may be valuable. DOIs can also be very helpful here to trace back the steps.

We added a pointer to the ES-DOC Errata Service website, which collects known issues for all projects served through the ESGF. DOIs were also added to the data-sources folder.

jesusff commented 3 years ago
  1. In terms of regridding, the regridding algorithms and source code could be references. Basically, at whatever possible granularity, the tools that encompass Atlas should be archived, along with a DOI, especially since DOIs can support different versions should that be needed.

The README of the regridding folder includes now pointers to the exact CDO version used in this process, along with the DOI of the documentation, to track any problem which could potentially arise in the future regarding the implementation of the algorithm. All tools and data in this repository will be archived at https://doi.org/10.5281/zenodo.3691645

jesusff commented 3 years ago
  1. I think the purpose of Atlas and the maps shown in atlas could be documented to be more specific For example, if "key figures" are shown in Atlas, how are the key figures determined? Are they from the chapter figures? If yes, how does this effort compare to the ESMValTool effort.

The reproducibility/ folder (which will be available after the SPM approval) will include scripts to reproduce parts of the Atlas chapter figures. In particular the boxplot and scatterplot figures of regional climate change projection of temperature and precipitation for the AR6 WGI reference regions. The figures will be labeled in the README file to trace them back to the exact Atlas figures they are reproducing.