brentp / somalier

fast sample-swap and relatedness checks on BAMs/CRAMs/VCFs/GVCFs... "like damn that is one smart wine guy"
MIT License
254 stars 35 forks source link

Comparing Results with Multiple Samples with Same Name #80

Closed brcopeland closed 2 years ago

brcopeland commented 2 years ago

I have a WGS pipeline, and when I have a sample with multiple read groups (1 BAM/read group), I want to confirm they all correspond to the same individual prior to merging them. I tried just following the instructions here and found somalier kept overwriting the same file which I realized would be because each BAM is labeled with the same SM tag in the @RG line in the header. I was able to handle this by placing somalier extract output into separate directories and renaming the resulting files. Upon running somalier relate, however, I find all comparisons in, for example, somalier.pairs.tsv reference the same sample name again. If there is a relatedness problem this would make it difficult to infer which BAM(s) was the problem.

I could of course reheader the BAMs to give them distinct SMs but I would prefer to not have to do that just for this step. Do you have any suggestion as to how to accomplish this?

brentp commented 2 years ago

Hi, you can use the --sample-prefix argument to somalier extract for this. Just give each file a unique --sample-prefix and then you'll be able to distinguish them in the output.