broadinstitute / epi-SHARE-seq-pipeline

Epigenomics Program pipeline to analyze SHARE-seq data.
MIT License
17 stars 3 forks source link

Chromap read format #154

Closed mei-knudson closed 6 months ago

mei-knudson commented 6 months ago

Submitting this as a draft

Removing correcting and trimming Removing "no_align" pipeline mode Adding get_read_format task prior to Chromap

Not sure if we need to add any additional logic Not sure if it would make more sense to have the read format task input be fastq_barcode[0] instead of read2[0] Would appreciate feedback on task name, variable names, structure or anything else

mei-knudson commented 6 months ago

why STARsolo doesn't need to have the parameters updated?

You mean for specifying where the barcodes are? For STARsolo, I used negative indexing to specify the barcode positions; since it's indexing from the end, the R2 read length doesn't matter. But it makes sense that we might want to pass in our own read_format, I'll try to make the STARsolo task accommodate that

mei-knudson commented 6 months ago

Made suggested changes (tee, get_chromap_read_format, fastq = fastq_barcode[0]).

I changed the logic in subwf_atac so that task_chromap_read_format is called if chemistry=='shareseq' AND read_format is not defined, so that we can pass in a custom read_format for SHARE. I set the default value of read_format to "bc:0:-1,r1:0:-1,r2:0:-1" in the chromap task (would be used for 10X).

I added a soloCBposition input for STARsolo and set the default value to be the string we use for specifying the SHARE barcodes. The default should be ok since we don't use soloCBposition when we run STARsolo on 10X.

Tested both chromap and STARsolo, ran successfully