ComparativeGenomicsToolkit / Comparative-Annotation-Toolkit

Apache License 2.0
170 stars 48 forks source link

CAT T2T CHM13 GFF3 have gene_name mismatches between gene and transcript records #306

Open diekhans opened 11 months ago

diekhans commented 11 months ago

I've attached an example of EPHA2, which has the name MSTRG.59 in the gene records. Also attached are a report on all of the gene name mismatches

gff3 is at https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/annotation/chm13.draft_v2.0.gene_annotation.gff3

I suggest adding a sanity check before writing the file that all of the records for a given gene have the same gene_name MSTRG.59-EPHA2.gff3.gz gene_annotation_probs.tsv.gz