Closed SamBryce-Smith closed 3 years ago
This also happens sometimes (!!) with add_region_number
, which internally uses pr.assign. THe key error is for a chromosome (not chromosome/strand pair), which is a little weird. Also, the error doesn't seem to happen each time you run the script (sometimes it sails through). I have no idea what is going on
'sometimes' is because occasionally a merged GTF will contain transcripts with 'undefined strand' (i.e. not '+' or '-'). PyRanges will then read this GTF in as an 'unstranded' object. add_region_number
and other functions only return values if strand col is '+' or '-', so errors are raised when no output is produced for these funky chromsomes tuples. I think simplest way around is to filter these transcripts with undefined strand out
This would likely apply to:
Getting introns from gr.
Possibly other cases too. There should be/ I should find a way to remove these keys from a PyRanges object to prevent this though