Open sridhar0605 opened 1 year ago
For any that would run in to this issue, can confirm removing HLA contigs solved the issue.
grep -v 'HLA-' input.sam > input_filtered.sam
Here's the line for that error: https://github.com/Xinglab/espresso/blob/v1.3.2/src/ESPRESSO_S.pl#L462
ESPRESSO tries to keep some information in a string with :
as a separator. Specifically it gives an ID to splice junctions like {chr}:{start}:{end}
. Later it tries to parse that ID string, but that fails if the contig has :
in the name
In this case HLA-DRB1*03:01:01:02:12301:13089
is the splice junction ID and the parts are HLA-DRB1*03:01:01:02
, 12301, and 13089. ESPRESSO ends up thinking the part up to the first :
(HLA-DRB1*03
) is the contig name
Ideally ESPRESSO should be able to handle any contig name. I'll see if I can change this behavior
Thanks, but my inclination was may be something to do with string/regex expansion. I tried hacking the script following perl regex but failed.
Thanks for looking in to this.
Sid
Hi @EricKutschera ,
Using Ensembl gtf and fa with
HLA
contigs. I see the below error withESPRESSO_S.pl
stepAny thoughts? FWIW test data in the repo works fine.
Thank you. Sid