Closed ambrosejcarr closed 4 years ago
I worry about the last bullet point on the static-inputs.json
. It seems like it'd be really easy for the workflow that generates that to get out of date if we were to add an input or parameter to the workflow and not remember to update this thing that generates inputs for us. Or is a static-inputs json different from a regular inputs json?
Must be easily extensible to new organisms
๐
It should not be limited to gencode
๐ but will do later, since HCA doesn't need this.
Needs to include instructions on how to add new references to the build process
๐
I propose that it checks if the md5 of the latest build matches that of previous and therefore uploads file duplication
Good idea, but would prioritize lower than other suggestions since this is a cost optimization.
There should be a way to easily parametrise reference creation (e.g. STAR splice site flanking sequence length)
Is this satisfied by passing a very large number of default parameters to reference creation?
I worry about the last bullet point on the static-inputs.json. It seems like it'd be really easy for the workflow that generates that to get out of date if we were to add an input or parameter to the workflow and not remember to update this thing that generates inputs for us. Or is a static-inputs json different from a regular inputs json?
As I look through the existing reference data it's hard to intuit which references to use in what tasks, and what WDL to use to create references for new organisms. The use case I'm hoping to satisfy is that a user writing a new pipeline could easily understand which references are already generated in a satisfactory way for their pipeline, and which references need new generation tasks.
I believe we did this... Closing issue.
Reference construction
Currently the skylab repository creates a number of references that are used by smart-seq 2, Cellranger, and Optimus. These include:
In addition, there are reference creation options for different genome subsets:
In accessory workflows, there are additionally:
Design proposal: A single workflow that takes as parameters:
Additional requirements:
static-inputs.json
file for each workflow run by skylab.