SACGF / cdot

Transcript versions for HGVS libraries
MIT License
29 stars 5 forks source link

Processing of mitochondrial genes RNA genes broken for ENSEMBL #72

Closed holtgrewe closed 6 months ago

holtgrewe commented 6 months ago

For example, MT-TG is broken.

JSON

    "ENSG00000210154": {
      "biotype": [
        "Mt_tRNA",
        "ncRNA",
        "tRNA"
      ],
      "description": null,
      "gene_symbol": null,
      "url": "ftp://ftp.ensembl.org/pub/release-111/gff3/homo_sapiens/Homo_sapiens.GRCh38.111.gff3.gz"
    },

However, in the GFF3 file:

MT      insdc   ncRNA_gene      8295    8364    .       +       .       ID=gene:ENSG00000210156;Name=MT-TK;biotype=Mt_tRNA;description=mitochondrially encoded tRNA-Lys (AAA/G) [Source:HGNC Symbol%3BAcc:HGNC:7489];gene_id=ENSG00000210156;logic_name=mt_genbank_import_homo_sapiens;version=1
MT      insdc   tRNA    8295    8364    .       +       .       ID=transcript:ENST00000387421;Parent=gene:ENSG00000210156;Name=MT-TK-201;biotype=Mt_tRNA;tag=basic,Ensembl_canonical;transcript_id=ENST00000387421;transcript_support_level=NA;version=1
davmlaw commented 6 months ago

Hi, by broken, I'm assuming you specifically meant the symbol and description were null?

I have fixed this and made a new release:

https://github.com/SACGF/cdot/releases/tag/data_v0.2.24

holtgrewe commented 6 months ago

Sorry for not being specific, yes.

Thanks a bunch!