LRGASP / lrgasp-submissions

Definition and validators for LRGASP submissions
MIT License
8 stars 2 forks source link

duplicate ids in different genes #23

Closed diekhans closed 4 years ago

diekhans commented 4 years ago

from mock submissions

One transcriptome submitted with duplicated transcripts IDs (same transcript ID in different genes IDs) → SQANTI3 failed

FJPardoPalacios commented 4 years ago

I tried to run SQANTI3 for Cindy's PacBio transcriptome (without expression because of the previously reported issues with the IDs), but I couldn't because 2 transcript_ids were used several times for different isoforms (all of them mapping to the chrEBV). Its path in the Drive folder is Long-read RGASP/Submissions/Cindy_submitter_id/pacbio_mod_isoforms/5-pacbio-collapse.isoforms.gtf .

The transcript_ids that gave me problems were: "m54284U_191110" and "m54284U_191111". The rest of the transcript_ids were not duplicated and they were larger e.g. "m54284U_191111_172039/99944726/ccs".

diekhans commented 4 years ago

GTF validations detect this problem. thanks!!