ATLAS Review- a consolidated review of several minor elements

aradhakrishnanGFDL commented 3 years ago

Firstly, this is such a great effort. The portal has a nice look and feel with a plethora of information one can access without having to even download the datasets or write a script. Great work. Just some knit picking in the best interest of Atlas, can be found below. Thanks for the oppurtunity.

R is great and the use of Jupyter notebooks is fantastic too. The scope could be expanded by including languages like Python. This would also fit under the “Reusability” and “Interoperability” aspects in FAIR. This also sets a future vision to containerize the scripts to promote reusability.
The csvs in https://github.com/IPCC-WG1/Atlas/tree/master/ATLAS-inventory are great ways to establish some level of provenance. The version identifier is extremely important, nicely captured for CMIP6. For CMIP5 though, it’s not specified in the .csv. I believe version strings did exist for CMIP5 as well. The CMIP6 siconc inventory from ESGF also seems to be missing version identifiers https://github.com/IPCC-WG1/Atlas/tree/master/ESGF-inventory/CMIP6.
Would be nice to use CMIP6 data DOIs in ATLAS emphasizing FAIR principles and giving credit to the data producers.

For the datasets used, it’s nice to see the observation datasets listed. It might help to specify versions for all of the datasets, provide a data DOI or at least a pointer to the data in the public domain.

The use of Jupyter notebooks are promising. https://github.com/IPCC-WG1/Atlas/blob/master/notebooks/regional_delta_changes.ipynb How extensible and collaborative the framework is?

The notebook examples themselves would be great to have more markup documentation, especially the sections where the data is ingested to let users know how it can be expanded. Also, what does one need to do in order to reproduce the notebook figures? This question should be addresses in simple words to promote reusability.

Metaclip sounds like a great utility. But I am unable to understand how it's connected to Atlas to see the provenance information in the scripts or figures. Some additional info regarding this would be great for users to leverage Metaclip.
Design and Usability: After making selections on the atlas, the user could be prompted to say how the maps are updated, which is just hovering outside in this case and not expecting a submit action. This is cool, but can be a bit lost without having to see the updated map.

It will be nice tooltips while hovering over different locations on the atlas which then lists the products such as time series, etc options. These type of actions could also be documented in the README or user documentation links.

The maps downloaded from the atlas portal do not carry enough metadata. This could potentially be used in presentations and papers and it will be great to see the provenance info on the maps, unless I am missing it. Once again,the data DOIs or acknowledgements are essential to make sure users continue to track provenance.
Possible typos in Help & Instructions and missing links. In particular, the SOD incldues Climatic Impact Drivers as defined in Chapter 12 SOD (in is missing) Climatic Impact Drivers - map legend, confidence. In direction of change About page: The inventory of model runs, scenarios and variables used in the SOD is available at the IPCC WG1 Atlas repository . But, the link https://github.com/IPCC-WG1/Atlas/tree/master/AtlasHub-inventory is not a valid reference. Perhaps you’re referring to https://github.com/IPCC-WG1/Atlas/tree/master/ATLAS-inventory
I think the purpose of Atlas and the maps shown in atlas could be documented to be more specific For example, if "key figures" are shown in Atlas, how are the key figures determined? Are they from the chapter figures? If yes, how does this effort compare to the ESMValTool effort.
Consider renaming master branch to main branch. https://github.com/github/renaming
In terms of findability and accessibility, one could consider inventorying the scripts used. Sorry if I overlooked this, the README is a great place to start.
In terms of regridding, the regridding algorithms and source code could be references. Basically, at whatever possible granularity, the tools that encompass Atlas should be archived, along with a DOI, especially since DOIs can support different versions should that be needed.
Dad data retraction elements, pointers to errata pages (if any) for the datasets used also should be documented-- if there be a situation like that. I think this is a general problem that has no direct solution. Your csv listing is a great start. Some basic documentation or prompts for users to check data versions before running the notebooks, say few years from now on a data that later has a newer version - may be valuable. DOIs can also be very helpful here to trace back the steps.
For future work, it will be great to see more generalized data cataloguing utilities that can readily help with analysis. Some python based utilities exist today. To help users work with big data, example notebooks for use with xarray, dask etc exploring on and off the cloud capabilities would be cool. Overall, this is a great initiative.