dputhier / libgtftk

gtftk C Library and program
GNU General Public License v3.0
3 stars 2 forks source link

get_sequence and chromosome name #99

Open dputhier opened 2 years ago

dputhier commented 2 years ago

Hi @fafa13 , In the get_sequence function, the chromosome names that are returned may contain additional information. Indeed, in the following case the string following " " (dna_sm:chromo...) should be discarded. Indeed, this information won't be present in the names of the chromosomes found in GTF files. Would it be possible to update this in the library ? Best

    >1 dna_sm:chromosome chromosome:TAIR10:1:1:30427671:1 REF
    ccctaaaccctaaaccctaaaccctaaacctctgaatccttaatccctaaatccctaaat
    ctttaaatcctacatccatgaatccctaaatacctaattccctaaacccgaaaccGGTTT
    CTCTGGTTGAAAATCATTGTGTATATAATGATAATTTTATCGTTTTTATGTAATTGCTTA
    TTGTTGTGTGTAGATTTTTTAAAAATATCATTTGAGGTCAATACAAATCCTATTTCTTGT
    GGTTTTCTTTCCTTCACTTAGCTATGGATGGTTTATCTTCATTTGTTATATTGGATACAA
    GCTTTGCTACGATCTACATTTGGGAATGTGAGTCTCTTATTGTAACCTTAGGGTTGGTTT

Should be:

Hi @fafa13 , In the get_sequence function, the chromosome names that are returned may contain additional information. Indeed, in the following case the string following " " (dna_sm:chromo...) should be discarded. Indeed, this information won't be present in the names of the chromosomes found in GTF files. Would it be possible to update this in the library ? Best

    >1
    ccctaaaccctaaaccctaaaccctaaacctctgaatccttaatccctaaatccctaaat
    ctttaaatcctacatccatgaatccctaaatacctaattccctaaacccgaaaccGGTTT
    CTCTGGTTGAAAATCATTGTGTATATAATGATAATTTTATCGTTTTTATGTAATTGCTTA
    TTGTTGTGTGTAGATTTTTTAAAAATATCATTTGAGGTCAATACAAATCCTATTTCTTGT
    GGTTTTCTTTCCTTCACTTAGCTATGGATGGTTTATCTTCATTTGTTATATTGGATACAA
    GCTTTGCTACGATCTACATTTGGGAATGTGAGTCTCTTATTGTAACCTTAGGGTTGGTTT
dputhier commented 2 years ago

See https://github.com/dputhier/pygtftk/issues/171