YeoLab / eCLIP

Other
38 stars 26 forks source link

Which barcode-specific bam are used? #33

Open biofilos opened 2 years ago

biofilos commented 2 years ago

Hello. First, thank you very much for the pipeline

I am in the process of implementing your pipeline in WDL (aiming to run it in our Cromwell server via AWS with infrastructure that requires WDL files). So far, I get most steps of the pipeline. However, it is not clear to me how the different fastQ files from the demultiplexed step are used.

As I understand it, after the demultiplexing step (running eclipdemux), I get a llist of files, one per barcode of the form .BC.r1.fq.gz and .BC.r2.fq.gz, where BC is each of the barcodes.

From what I can gather in the SOP , the rest of the steps are done starting with barcode-specific fastq (in the SOP, *CO1.r1.fq.gz).

My question is, should I merge these files at a prticular point in the pipeline? Should I merge the files of all the barcodes, or only those using the barcodeA and barcodeB?

Thank you

Juan Felipe Ortiz, Ph.D. GeDaC. Cancer Sciences Institute National University of Singapore

byee4 commented 1 year ago

Hi Juan,

Cool! My experience with WDL/Cromwell isn’t quite proficient but I’d be curious to know how WDL works with AWS. I did hear that AWS was starting to support CWL although am unsure at which capacity.

For paired-end eCLIP, you’re correct that the eclipdemux step will produce several files, at which point you will want only the files associated with the expected barcode (and make sure most of the reads do end up getting binned here). For ENCODE, we did not assign barcodes to size-matched input samples, so all input samples are effectively unassigned (the designation we use is ‘NIL’) instead, though this is experiment specific.

You’re also correct that these files will be merged after PCR collapsing/deduplication. Then, R2 of the merged bam files will be used for peak calling with CLIPper. If the size-matched inputs lack inline barcodes, they may not need to be merged.

SECURE: MESSAGE FROM Juan Felipe Ortiz ON 9/20/22, 12:36 AM

Hello. First, thank you very much for the pipeline

I am in the process of implementing your pipeline in WDL (aiming to run it in our Cromwell server via AWS with infrastructure that requires WDL files). So far, I get most steps of the pipeline. However, it is not clear to me how the different fastQ files from the demultiplexed step are used.

As I understand it, after the demultiplexing step (running eclipdemux), I get a llist of files, one per barcode of the form .BC.r1.fq.gz and .BC.r2.fq.gz, where BC is each of the barcodes.

From what I can gather in the SOPhttps://urldefense.com/v3/__https:/raw.githubusercontent.com/YeoLab/eclip/master/documentation/eCLIP_analysisSOP_v2.2.docx__;!!LLK065n_VXAQ!jgEJwqjfBnBnGKgFAK-UfwtQnhK1luXB1OUmn-EzUGv_8HT-M5O2LDA8oSEc9_NtBOtdxojSGHcIStNWkWW9sxv6pxv6$ , the rest of the steps are done starting with wach barcode-specific fastq (in the SOP, *CO1.r1.fq.gz).

My question is, should I merge these files at a prticular point in the pipeline? Should I merge the files of all the barcodes, or only those using the barcodeA and barcodeB?

Thank you

Juan Felipe Ortiz, Ph.D. GeDaC. Cancer Sciences Institute National University of Singapore

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https:/github.com/YeoLab/eclip/issues/33__;!!LLK065n_VXAQ!jgEJwqjfBnBnGKgFAK-UfwtQnhK1luXB1OUmn-EzUGv_8HT-M5O2LDA8oSEc9_NtBOtdxojSGHcIStNWkWW9s_GoBOKe$, or unsubscribehttps://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/AB7TJP6FWXTOWKLG4LPL7B3V7FSPZANCNFSM6AAAAAAQQZATXM__;!!LLK065n_VXAQ!jgEJwqjfBnBnGKgFAK-UfwtQnhK1luXB1OUmn-EzUGv_8HT-M5O2LDA8oSEc9_NtBOtdxojSGHcIStNWkWW9s0Vqfxcl$. You are receiving this because you are subscribed to this thread.Message ID: @.***>