brentp / somalier

fast sample-swap and relatedness checks on BAMs/CRAMs/VCFs/GVCFs... "like damn that is one smart wine guy"
MIT License
254 stars 35 forks source link

bam files without `@RG` tag #115

Closed mschubert closed 1 year ago

mschubert commented 1 year ago

Thanks a lot for the tool, it's really easy to use, and after the first try already found a sample swap in our setup!

I'm now trying to confirm sample matches between WES and RNA-seq data from an external sequencing provider. I ran into an issue with the RNA-seq bam files provided, where I get the following output from somalier:

Error: unhandled exception: [somalier] no read-group in bam file [ValueError]

The @RG field is indeed missing in these bam file headers.

Is there a way to manually supply the sample ID (e.g. via a command-line argument)?

brentp commented 1 year ago

Hi Michael, always nice to hear that a tool is useful!

I added a way to do this via env variables. Will you give this binary a try (gunzip, chmod +x) and run as:

SOMALIER_SAMPLE_NAME=my_sample somalier_dev extract ...

where my_sample is the name you want to use?

somalier_dev.gz

I'll get a release out soon.

mschubert commented 1 year ago

Wow, that was quick, thank you! :tada:

I can confirm that the binary works as expected and solves my issue

cjfields commented 1 year ago

Just a note that I ran into the same issue and this worked wonderfully. I did see there is a --sample-prefix option, was this also meant for adding the sample name?

brentp commented 1 year ago

Glad to hear it works. --sample-prefix is for when you have multiple samples with the same ID, for example if the same sample had RNA-Seq and DNA-Seq. Then the user can specify a sample-prefix so that they (and the hashtable in somalier) can differentiate. Release for this change is on my TODO.