AlexsLemonade / refinebio-examples

Example workflows for refine.bio data
https://www.refine.bio
Other
11 stars 5 forks source link

Remove subdirectory folders in each section #171

Closed cansavvy closed 4 years ago

cansavvy commented 4 years ago

From #163

I think we should move forward with this idea. But here's how I think we should do it: We need to a notebook organization method that is sans folders.

Notebook naming conventions:

I have two ideas that can be used together or we just choose one over the other.

1) Number the notebooks in a way that those of the same module are kept together Because we are making our notebooks self-contained, the order is not 100% relevant but still matters. So the order of the notebooks could be from 'most foundational knowledge' and/or 'least amount of background knowledge required' is first.

2) Each notebook has a prefix if it comes in a group of notebooks. So all differential_expression_, pathway_analysis_ Currently most modules contain one notebook, some two, except pathway analysis which has 6. I do find this quite ugly however and it will make notebook names quite long.

Data organization

This is more straightforward and is the perk of this new folder structure because duplicate datasets (for modules that may both use the same dataset) will not need to be kept. (not sure how much this is an issue currently)

Example: 02-microarray/data/experiment_accession/

Example: 02-microarray/data/

Intro to "module"s

Any pertinent background information in the current READMEs would go in new "intro_to_pathway_analysis.Rmd" type files. (noting that all installation info and usage info can be dropped because its all covered in the "getting started per analysis" section.) Old READMEs are deleted.

Plots and results folders

This might look like a bit of a mess unless there's some kind of symlink magic we can use to keep plots/results in their own folders. But we don't want to change the code in the Rmd itself. But we would like the files to be organized.

cansavvy commented 4 years ago

I wanted to see what this might look like. It's actually not as bad as I thought if we stick with file name conventions. IDK. What do we think?

Screen Shot 2020-08-13 at 2 07 05 PM
cansavvy commented 4 years ago

Main question is how might this strategy do as we continue to add content?

jashapiro commented 4 years ago

Are the numbers really needed? I can see a numeric prefix for the intro to keep it at the top, but if the others are organized by prefix, maybe that could be enough? So perhaps something like dimension-reduction_01_pca.Rmd and dimension-reduction_02_umap.Rmd for multipart notebooks. (Do we even want multipart notebooks?)

cansavvy commented 4 years ago

Are the numbers really needed? I can see a numeric prefix for the intro to keep it at the top, but if the others are organized by prefix, maybe that could be enough? So perhaps something like dimension-reduction_01_pca.Rmd and dimension-reduction_02_umap.Rmd for multipart notebooks. (Do we even want multipart notebooks?)

Something I failed to mention as a perk of having the numbers in the front and single level directory, is we could use bookdown (But I am unsure what functionality that would add for us at this time - need to look into it more). Bookdown wants numbers in front I believe.

cansavvy commented 4 years ago

Copy over the experiment accession folder like refine.bio has it when you aggregate by experiment.

cansavvy commented 4 years ago

Are the numbers really needed? I can see a numeric prefix for the intro to keep it at the top, but if the others are organized by prefix, maybe that could be enough? So perhaps something like dimension-reduction_01_pca.Rmd and dimension-reduction_02_umap.Rmd for multipart notebooks. (Do we even want multipart notebooks?)

We'll use this notebook naming strategy.