Developing Guidance & Documentation for clim-recal

dingaaling commented 1 year ago

https://squidfunk.github.io/mkdocs-material/
@RuthBowyer could you please add some links to the Markdown files here?
Guidance: what you should do
Documentation: what you did

Plan:

[x] create vision for clean pipeline (see commit 89b70a5)
[x] added sphinx base for future documentation to branch extend_documentation
[x] split existing content into main pipeline walk-through (visible in readme) and internal document
[x] create "narrative walk-through" for pipeline still to do
[x] change all paths to be either azure specific or dummy general purpose
[x] change all steps to use just one metric, city and run as example
[x] add Griff's azure doc to INTERNAL.md
[ ] add contributors
[x] fix section links

aranas commented 1 year ago

Before we go into creating a nice looking website using mkdocs, I would like to start by mapping out the individual elements in terms of guidance & documentation more closely by reworking the README. I suggest that the README contains some guidance (and links to further info where it gets too deep) and this guidance should be clearly split into different user groups, eg non-climate scientist and more expert researchers, because they will have different goals when interacting with the project. Here is a draft, maybe we can discuss this later today together

Structure

Intro (what is it, for whom)
ToC
Quick start guide (download repo, install dependencies & run small-scale example)
small-scale example as notebook
Guidance
- For non-climate scientists (why BC, which BCs brief taxonomy viz, how to decide/flowchart)
- For expert researchers (detailed technical guides, eg code examples & BC tutorials; how to contribute)
Documentation (divided into python & R pipelines?)
- Installation & setup
- Where to get data from & data format (eventually MO open data portal)
- functions docstrings
- FAQs
Research (review, references)
License & contributors

Resources for good README

Turing Way best practices on Landing Page
example projects with guidance and documentation in readme

aranas commented 1 year ago

The part about downloading data from zure is only for internal info right? Or will this be outward facing info? https://github.com/alan-turing-institute/clim-recal#accessing-the-pre-downloadedpre-processed-data

gmingas commented 1 year ago

The part about downloading data from zure is only for internal info right? Or will this be outward facing info? https://github.com/alan-turing-institute/clim-recal#accessing-the-pre-downloadedpre-processed-data

Yes, this is for internal info and could be moved to another readme file or somewhere else, possibly with a link to it from the main readme.

RuthBowyer commented 1 year ago

I think this could be useful for our partners if this is how we share the data with them (which I think is still tbc?) jic you were unaware (see issues 37 and 38 )

dingaaling commented 1 year ago

The overview you've drafted looks great, @aranas!

Two additional example resources I'd add as README reference points for us are BIG-bench and EleutherAI's lm-evaluation-harness. These are two examples of creating standardised resources for evaluating and comparing LLMs on a range of tasks. I think BIG-bench is better documented atm and probably the better source of inspo for us, but the eval-harness is also going through a major refactor atm.

Beyond the README, another useful reference point is how they document tasks in a summary table for Big-Bench and task-table for the eval-harness. I recommend we add that as a priority so we (and any users!) have a standard map/naming we can use to refer to our different BC methods.

aranas commented 1 year ago

another example of a benchmark style repo but closer to home: https://github.com/duncanwp/ClimateBench

dingaaling commented 1 year ago

@gmingas prioritisation feedback: Guidance (e.g. step by step) of how to use the pipeline via CLI or notebook

@RuthBowyer feedback: we're still trying to figure out who the users are

@aranas "narrative around the pipeline"

aranas commented 1 year ago

@RuthBowyer atm according to the anlysis flowchart the Cropping_Rasters_to_three_cities.R script takes the resampled files and extracts data for three cities before passing on to further preprocessing (splitting into test & validate and eventually applying bias correction). For me to include this into the pipeline walk-through could you provide the relevant R specs, e.g version, environment files specifying packages?

aranas commented 1 year ago

@RuthBowyer, I am wondering should we host the shapefiles on the GitHub repo to make this more accessible? they don't seem very big.

Else, I will need the source for this shapefile: NUTS_Level_1_January_2018_FCB_in_the_United_Kingdom_2022_7279368953270783580 -- this shapefile used for defining regions and cutting, also London -- this one also used for chopping up LCAT data

aranas commented 1 year ago

For the analysis walk-through I will provide shell commands to execute full pipeline end to end for one. I think it would suffice to illustrate this with one metric, one city, one run, rather than including the loops. Would you agree or should the walk-through include the loops?

gmingas commented 1 year ago

For the analysis walk-through I will provide shell commands to execute full pipeline end to end for one. I think it would suffice to illustrate this with one metric, one city, one run, rather than including the loops. Would you agree or should the walk-through include the loops?

Totally agree, just one combination is enough. And the script with the loops will be available in the codebase too.

gmingas commented 1 year ago

@RuthBowyer, I am wondering should we host the shapefiles on the GitHub repo to make this more accessible? they don't seem very big.

Else, I will need the source for this shapefile: NUTS_Level_1_January_2018_FCB_in_the_United_Kingdom_2022_7279368953270783580 -- this shapefile used for defining regions and cutting, also London -- this one also used for chopping up LCAT data

I would support including them in the repo if there are no licensing issues.

RuthBowyer commented 1 year ago

Yep sounds good - all downloaded from OA sources but might need to double check the licenses on the sites

griff-rees commented 1 year ago

I've added ticket #42 for configuring how the documentation is rendered and maintained.

griff-rees commented 1 year ago

I've added some screenshots from using quarto in #42. Great to get a sense if people like that option (and sorry I think I commented on #56 thinking it was this one, my bad).

aranas commented 1 year ago

Yep sounds good - all downloaded from OA sources but might need to double check the licenses on the sites

I will create a new issue for this @RuthBowyer

aranas commented 1 year ago

I think I have now written all the sections that I wanted to complete, so I will open PR #62 for review. Please feel free to comment / fix / close while I am on A/L this week.

Some open questions / comments from my side:

R version specs need to be added to requirements
(line 123 in guidance) currently R script logic at this point quite different from python scripts, as all paths and variables are hardcoded in script. Either I adjust this in the guidance by instructing to either change paths in R script or paste R code with dummy paths directly into guidance or we adjust the Cropping R script.
- for simplicity in guidance I chose to remove some flags that had defaults and were not crucial conceptually for the analysis (eg multiprocessing option), but feel free to add those back in if you think they are needed

gmingas commented 1 year ago

As discussed today: @RuthBowyer when you are back could you please have a look at this and add the documentation parts relevant to the R pipeline e.g. R packages list. Recommendation is to do this in a PR separate from #62

gmingas commented 1 year ago

@aranas to talk to @griff-rees to decide on easiest approach for merging the quarto and guidance branches (quarto branch already has merged main branch, which was challenging)

griff-rees commented 1 year ago

@gmingas the merge was managed in https://github.com/alan-turing-institute/clim-recal/pull/72

alan-turing-institute / clim-recal

Developing Guidance & Documentation for clim-recal #42

Structure

Resources for good README