WGLab / LIQA

Long-read Isoform Quantification and Analysis
Other
37 stars 12 forks source link

Should not use gene name as key #30

Closed baraaorabi closed 3 months ago

baraaorabi commented 1 year ago

ENSEMBL has many different genes with the same gene name. E.g. U2: http://useast.ensembl.org/Human/Search/Results?q=U2;site=ensembl;facet_species=Human;page=1

I think gene ID should be used as key all the time so the following lines:

https://github.com/WGLab/LIQA/blob/8e098567a0d0d0d9e9318cf80e044441b51bd93a/liqa_src/PreProcess_gtf.pl#L36-L47

should be something like:

    my $info = $a[8];

    my @b = split("gene_id", $info);
    my @c = split("\"", $b[1]);
    $gene = $c[1];
    my @d = split("transcript_id", $info);