Closed paolap closed 1 year ago
Just noticed this was closed with this pull request but I meant the structure of the Create a dataset guidelines more than of the overall book :-) sorry for the confusion, so I'm opening this again, as we haven't got an outline for that section yet
Once outline is created, @AviRamchurn to review/contribute from BoM perspective.
(Claire's example, please ad your own)
Steps to create data:
This is my go at this:
1) Planning DMP including basic info as backup, input files, tools used, license of what is used and potential output. This ideally should be part of project planning but might still be worth mentioning it here
2) Structuring file 2 a) Use case 1 completely new file
2 b) Modified existing file
3) Directory structure Depending on how many files you are going to produce you also want to make sure you have some directory structure implemented before the number of files become hard to track. It is always best to spearate different experiments/analysis. Also making sure to include provenance details in the file name itself reduce the risk of confusing different outputs
4) Backup Set up a backup strategy (could be part of DMP planning phase) Keep code under version control
5) Documentation/provenance
I find hard to separate what is, strictly speaking, creating new files and what is managing them, i.e. directories, backup, planning etc. We should probably mention both aspects but making sure we're not repeat too much of what might be in other sections.
I'm closing this as this is done
We should work out an outline for these guidelines. We could start by listing the main steps of a dataset creation