PapenfussLab / gridss

GRIDSS: the Genomic Rearrangement IDentification Software Suite
Other
258 stars 71 forks source link

Need help for multiple input #626

Closed licenzart closed 1 year ago

licenzart commented 1 year ago

I have some questions regarding using Gridss for multiple inputs. 1) I have control and cases BAM files, each of them duplicates, and I need to compare cases BAM files against control BAM files. e.g. A_1.bam A_2.bam C_1.bam C_2.bam Should I group them using the --labels option, as normal BAM files should be in the first according to your guide? like below: --labels C,A C_1.bam C_2.bam A_1.bam A_2.bam Will they give two output VCF files / assembly. bam for the C and A labels? Will they compare between A and C? or is it possible to have only one case and multiple controls?

2) Also, my data is not human (chicken species), so should I exclude '-b' option? Or maybe I can search for chicken species?

Thanks!

d-cameron commented 1 year ago

Sorry about the delayed response

Will they give two output VCF files / assembly. bam for the C and A labels?

One assembly.bam and one output VCF in all cases. What --labels controls is how the bams each map to the VCF FORMAT (a.k.a sample) columns.

If you want to know the presence/absence of SVs in each BAM then leave as default and you'll get 4 FORMAT columns with a breakdown of the support for each SV by bam. If you do the labels then you'll have two columns. That is, the output format will be the same as if you'd done a samtools merge of the input BAMs (but using labels will correctly handle the different library fragment size distributions of the different bams).

How you analyse your output is what defines how you should be grouping them. Do you care if an SV is in A_1 and not A_2? Similarly, will you treat a SV in C_1 and not C_2 the same as a SV that's in both? If you don't care about the per-replicate breakdown then group them. If it's meaningful, then keep them separate.