cancerit / BRASS

Breakpoints via assembly - Identifies breaks and attempts to assemble rearrangements in whole genome sequencing data.
GNU Affero General Public License v3.0
57 stars 20 forks source link

Any way to ignore PL in bam? #112

Open anoronh4 opened 1 year ago

anoronh4 commented 1 year ago

We have a few paired samples that for some reason have different platforms in each bam of a given pair. As a result we got the following error:

BAMs have different sequencing platform: ILLUMINA, ILLUMINA-NOVASEQ-6000

They are actually the same platform but the labeling is different. Based on the code Implement.pm I don't think using the -pl command line option will work. Is there any way to make brass run without rewriting the bams?

AndyMenzies commented 1 year ago

I think you will have to rewrite the bam headers. Luckily samtools makes that quite easy.

Use samtools view -H to get to the text of the header, make your edit and use samtools rehead to rewrite the header to the existing bam file.

https://www.htslib.org/doc/samtools-view.html https://www.htslib.org/doc/samtools-reheader.html

anoronh4 commented 1 year ago

thanks. i think an option to override this sort of check would be nice in a future release. does the given platform affect the analysis results? if not, seems like this check could be performed by the user.

AndyMenzies commented 1 year ago

That depends on the difference being seen. If you are talking about Illumina X10 v's Illumina Novaseq then it probably won't make much difference. But if you are comparing Illumina to ONT data it would have a significant impact. In our internal system we tend to populate Platform with the vendor (Illumina, ONT, PacBio) not a specific sequencing machine model.

anoronh4 commented 1 year ago

yes, but brass doesn't behave differently whether i have ILLUMINA-WGS in both bams, or just ONT in both bams, or abcdefg right? my point is that the user should already know and account for bams coming from different platforms. For us, re-writing very large bams is very inconvenient, for storage and for re-analysis. we will have to repeat several modules of our workflow, not just brass, because this check cannot be circumvented. in any case, this is just a suggestion, but i just wanted to clarify my thinking.