ctg-lund / Yggdrasil

the backbone of the CTG pipelines
0 stars 0 forks source link

BCL delivery process #14

Closed lokeshbio closed 4 months ago

chaetognatha commented 1 year ago

The samplesheet for rawdata deliveries will be called the same as every other, but it will only contain project ID on one line, no commas or anything else, so a file with one line that is the project ID! (there should also be flags to specify both project ID and rawdata delivery, in which case we shouldnt need a samplesheet at all!)

chaetognatha commented 1 year ago

Perhaps it will be useful to have this line that I use to do md5sum over multiple processes on LSENS:

module load parallel/20220722
raw=$1
parallel -j8 "md5sum {} >> ${raw}/md5.txt" ::: $(find ${raw}/Data -type f -print)
#forgot that I also need to do this
sed -i s'/\/projects\/.*\/upload\/\w\+/./g' ${raw}/md5.txt
lokeshbio commented 1 year ago

We should probably include what kind of delivery each project requires in the sample_sheet! At the moment, the example samplesheet look like this:

[Yggdrasil_Projects],,,
Project_ID,bcl,fastq,fastq_screen,fastq_screen_ref
2022_000,0,1,0,NA

If we keep adding more pipelines, it is not sustainable to keep adding columns! i would rather have these end deliveries set as Keys like BCL, FASTQ, FASTQ_SCREEN, RNASEQ, METHYLSEQ and so on.. that they are just in one column! then we can have these as binary parameters in Yggdrasil!

#In samplesheet

[Yggdrasil_Projects],,,
Project_ID,Delivery
2022_000,FASTQ
2022_001,RNASEQ

# Then in Yggdrasil nextflow script: we can set these parameters as

params.rnaseq = TRUE #specifically for 2022_001
lokeshbio commented 1 year ago

BCL delivery process needs to be discussed

lokeshbio commented 1 year ago

Hi @chaetognatha , Here is the example of the samplesheet above for a run that could potentially contain different deliveries in the same run. In this above example it is one project with RNASEQ and the other with FASTQ. If we have a test run and a test samplesheet like this, then I can test to run Yggdrasil all the way from raw-data to getting the rnaseq output!