Closed dingaaling closed 1 year ago
Before we go into creating a nice looking website using mkdocs, I would like to start by mapping out the individual elements in terms of guidance & documentation more closely by reworking the README. I suggest that the README contains some guidance (and links to further info where it gets too deep) and this guidance should be clearly split into different user groups, eg non-climate scientist and more expert researchers, because they will have different goals when interacting with the project. Here is a draft, maybe we can discuss this later today together
The part about downloading data from zure is only for internal info right? Or will this be outward facing info? https://github.com/alan-turing-institute/clim-recal#accessing-the-pre-downloadedpre-processed-data
The part about downloading data from zure is only for internal info right? Or will this be outward facing info? https://github.com/alan-turing-institute/clim-recal#accessing-the-pre-downloadedpre-processed-data
Yes, this is for internal info and could be moved to another readme file or somewhere else, possibly with a link to it from the main readme.
The overview you've drafted looks great, @aranas!
Two additional example resources I'd add as README reference points for us are BIG-bench and EleutherAI's lm-evaluation-harness. These are two examples of creating standardised resources for evaluating and comparing LLMs on a range of tasks. I think BIG-bench is better documented atm and probably the better source of inspo for us, but the eval-harness is also going through a major refactor atm.
Beyond the README, another useful reference point is how they document tasks in a summary table for Big-Bench and task-table for the eval-harness. I recommend we add that as a priority so we (and any users!) have a standard map/naming we can use to refer to our different BC methods.
another example of a benchmark style repo but closer to home: https://github.com/duncanwp/ClimateBench
@gmingas prioritisation feedback: Guidance (e.g. step by step) of how to use the pipeline via CLI or notebook
@RuthBowyer feedback: we're still trying to figure out who the users are
@aranas "narrative around the pipeline"
@RuthBowyer atm according to the anlysis flowchart the Cropping_Rasters_to_three_cities.R script takes the resampled files and extracts data for three cities before passing on to further preprocessing (splitting into test & validate and eventually applying bias correction). For me to include this into the pipeline walk-through could you provide the relevant R specs, e.g version, environment files specifying packages?
@RuthBowyer, I am wondering should we host the shapefiles on the GitHub repo to make this more accessible? they don't seem very big.
Else, I will need the source for this shapefile: NUTS_Level_1_January_2018_FCB_in_the_United_Kingdom_2022_7279368953270783580 -- this shapefile used for defining regions and cutting, also London -- this one also used for chopping up LCAT data
For the analysis walk-through I will provide shell commands to execute full pipeline end to end for one. I think it would suffice to illustrate this with one metric, one city, one run, rather than including the loops. Would you agree or should the walk-through include the loops?
For the analysis walk-through I will provide shell commands to execute full pipeline end to end for one. I think it would suffice to illustrate this with one metric, one city, one run, rather than including the loops. Would you agree or should the walk-through include the loops?
Totally agree, just one combination is enough. And the script with the loops will be available in the codebase too.
@RuthBowyer, I am wondering should we host the shapefiles on the GitHub repo to make this more accessible? they don't seem very big.
Else, I will need the source for this shapefile: NUTS_Level_1_January_2018_FCB_in_the_United_Kingdom_2022_7279368953270783580 -- this shapefile used for defining regions and cutting, also London -- this one also used for chopping up LCAT data
I would support including them in the repo if there are no licensing issues.
Yep sounds good - all downloaded from OA sources but might need to double check the licenses on the sites
I've added ticket #42 for configuring how the documentation is rendered and maintained.
I've added some screenshots from using quarto
in #42. Great to get a sense if people like that option (and sorry I think I commented on #56 thinking it was this one, my bad).
Yep sounds good - all downloaded from OA sources but might need to double check the licenses on the sites
I will create a new issue for this @RuthBowyer
I think I have now written all the sections that I wanted to complete, so I will open PR #62 for review. Please feel free to comment / fix / close while I am on A/L this week.
Some open questions / comments from my side:
As discussed today: @RuthBowyer when you are back could you please have a look at this and add the documentation parts relevant to the R pipeline e.g. R packages list. Recommendation is to do this in a PR separate from #62
@aranas to talk to @griff-rees to decide on easiest approach for merging the quarto and guidance branches (quarto branch already has merged main branch, which was challenging)
@gmingas the merge was managed in https://github.com/alan-turing-institute/clim-recal/pull/72
Plan:
extend_documentation