Closed JackCurragh closed 1 year ago
Hi Jack,
Thanks for your interest in oxbow.
The InvalidReferenceSequenceName
error is coming from our upstream dependency noodles
which is reading/parsing the bam file. It looks like you're right that the brackets in the reference name are causing an issue. The name parsing happens in noodles
here. noodles
is aiming for compliance with file format specs (sam spec page 3 section 1.2.1) and errors out on this kind of thing.
Unfortunately, we don't have a way of handling nonconformant data with oxbow at the moment.
Ah yes that makes a lot more sense than arrow.. not sure what I was thinking when I wrote that!
Thank you for your response. I will have to find a way to sanitise the BAMs beforehand I guess!
Hello,
I have been using oxbox quite a lot since that initial blog post and have just run into a new issue that I am hoping there may be a work around for. It occurs when I try to read a BAM aligned to the SacCer genome with STAR as the tRNA reference names contain brackets.
eg. in the Ensembl annotation they have:
I assume these brackets are the root of the issue. Is there any chance that this could be handled within oxbow? Or is it a limitation imposed by arrow?
Thanks in advance.