alan-turing-institute / clim-recal

Open repository of methods for recalibrating & bias correcting UKCP18 climate projections data
https://alan-turing-institute.github.io/clim-recal/
MIT License
9 stars 2 forks source link

Developing Guidance & Documentation for clim-recal #42

Closed dingaaling closed 1 year ago

dingaaling commented 1 year ago

Plan:

aranas commented 1 year ago

Before we go into creating a nice looking website using mkdocs, I would like to start by mapping out the individual elements in terms of guidance & documentation more closely by reworking the README. I suggest that the README contains some guidance (and links to further info where it gets too deep) and this guidance should be clearly split into different user groups, eg non-climate scientist and more expert researchers, because they will have different goals when interacting with the project. Here is a draft, maybe we can discuss this later today together

Structure

Resources for good README

aranas commented 1 year ago

The part about downloading data from zure is only for internal info right? Or will this be outward facing info? https://github.com/alan-turing-institute/clim-recal#accessing-the-pre-downloadedpre-processed-data

gmingas commented 1 year ago

The part about downloading data from zure is only for internal info right? Or will this be outward facing info? https://github.com/alan-turing-institute/clim-recal#accessing-the-pre-downloadedpre-processed-data

Yes, this is for internal info and could be moved to another readme file or somewhere else, possibly with a link to it from the main readme.

RuthBowyer commented 1 year ago

I think this could be useful for our partners if this is how we share the data with them (which I think is still tbc?) jic you were unaware (see issues 37 and 38 )

dingaaling commented 1 year ago

The overview you've drafted looks great, @aranas!

Two additional example resources I'd add as README reference points for us are BIG-bench and EleutherAI's lm-evaluation-harness. These are two examples of creating standardised resources for evaluating and comparing LLMs on a range of tasks. I think BIG-bench is better documented atm and probably the better source of inspo for us, but the eval-harness is also going through a major refactor atm.

Beyond the README, another useful reference point is how they document tasks in a summary table for Big-Bench and task-table for the eval-harness. I recommend we add that as a priority so we (and any users!) have a standard map/naming we can use to refer to our different BC methods.

aranas commented 1 year ago

another example of a benchmark style repo but closer to home: https://github.com/duncanwp/ClimateBench

dingaaling commented 1 year ago

@gmingas prioritisation feedback: Guidance (e.g. step by step) of how to use the pipeline via CLI or notebook

@RuthBowyer feedback: we're still trying to figure out who the users are

@aranas "narrative around the pipeline"

aranas commented 1 year ago

@RuthBowyer atm according to the anlysis flowchart the Cropping_Rasters_to_three_cities.R script takes the resampled files and extracts data for three cities before passing on to further preprocessing (splitting into test & validate and eventually applying bias correction). For me to include this into the pipeline walk-through could you provide the relevant R specs, e.g version, environment files specifying packages?

aranas commented 1 year ago

@RuthBowyer, I am wondering should we host the shapefiles on the GitHub repo to make this more accessible? they don't seem very big.

Else, I will need the source for this shapefile: NUTS_Level_1_January_2018_FCB_in_the_United_Kingdom_2022_7279368953270783580 -- this shapefile used for defining regions and cutting, also London -- this one also used for chopping up LCAT data

aranas commented 1 year ago

For the analysis walk-through I will provide shell commands to execute full pipeline end to end for one. I think it would suffice to illustrate this with one metric, one city, one run, rather than including the loops. Would you agree or should the walk-through include the loops?

gmingas commented 1 year ago

For the analysis walk-through I will provide shell commands to execute full pipeline end to end for one. I think it would suffice to illustrate this with one metric, one city, one run, rather than including the loops. Would you agree or should the walk-through include the loops?

Totally agree, just one combination is enough. And the script with the loops will be available in the codebase too.

gmingas commented 1 year ago

@RuthBowyer, I am wondering should we host the shapefiles on the GitHub repo to make this more accessible? they don't seem very big.

Else, I will need the source for this shapefile: NUTS_Level_1_January_2018_FCB_in_the_United_Kingdom_2022_7279368953270783580 -- this shapefile used for defining regions and cutting, also London -- this one also used for chopping up LCAT data

I would support including them in the repo if there are no licensing issues.

RuthBowyer commented 1 year ago

Yep sounds good - all downloaded from OA sources but might need to double check the licenses on the sites

griff-rees commented 1 year ago

I've added ticket #42 for configuring how the documentation is rendered and maintained.

griff-rees commented 1 year ago

I've added some screenshots from using quarto in #42. Great to get a sense if people like that option (and sorry I think I commented on #56 thinking it was this one, my bad).

aranas commented 1 year ago

Yep sounds good - all downloaded from OA sources but might need to double check the licenses on the sites

I will create a new issue for this @RuthBowyer

aranas commented 1 year ago

I think I have now written all the sections that I wanted to complete, so I will open PR #62 for review. Please feel free to comment / fix / close while I am on A/L this week.

Some open questions / comments from my side:

gmingas commented 1 year ago

As discussed today: @RuthBowyer when you are back could you please have a look at this and add the documentation parts relevant to the R pipeline e.g. R packages list. Recommendation is to do this in a PR separate from #62

gmingas commented 1 year ago

@aranas to talk to @griff-rees to decide on easiest approach for merging the quarto and guidance branches (quarto branch already has merged main branch, which was challenging)

griff-rees commented 1 year ago

@gmingas the merge was managed in https://github.com/alan-turing-institute/clim-recal/pull/72