LRGASP / lrgasp-submissions

Definition and validators for LRGASP submissions
MIT License
8 stars 2 forks source link

submission with IDs that do not match GTF #22

Closed diekhans closed 3 years ago

diekhans commented 4 years ago

from mock submissions

One submission with IDs that do not match with the reported ones in the GTF file → Transcripts can be evaluated, but not their expression

FJPardoPalacios commented 4 years ago

This is the case of both transcriptomes uploaded by Cindy.

The ONT one had pretty similar IDs so I could fix it easily. For example, the isoform with a transcript_id "0037c387-bdb2-4952-a592-8e475de7e5bc" in the GTF file was called in the expression matrix as "0037c387-bdb2-4952-a592-8e475de7e5bc_chr7:28560000", which is actually the concatenation of the given transcript_id and gene_id.

However, for the transcriptome generated from PacBio data transcript_ids looked like CCS ids (e.g. "m54284U_191111_172039/57606429/ccs"), but the IDs reported in the expression matrix do not match at all (00000f84-64f2-4aef-a027-a86468e48da2;16). Probably that was an error uploading the wrong file...

diekhans commented 3 years ago

ids are now validated