HajkD / LTRpred

De novo annotation of young retrotransposons
https://hajkd.github.io/LTRpred/
GNU General Public License v2.0
45 stars 8 forks source link

Fasta header truncated in ltrdigest_tabout.csv #23

Closed a7032018 closed 3 years ago

a7032018 commented 3 years ago

Hi,

I found the fasta header is truncated in ltrdigest_tabout.csv.


$awk '{print $4}' ltrdigest_tabout.csv |head acc|NEIGHBOR|GQ36565 acc|NEIGHBOR|GQ36825 acc|GENBANK|HQ704802 acc|GENBANK|HQ704802 acc|GENBANK|HQ704802 acc|NEIGHBOR|GU11155 acc|GENBANK|GU385355 acc|GENBANK|GU385356 acc|GENBANK|GU385357


The original fasta header is like this:


acc|GENBANK|HQ704802.1|Organic Lake phycodnavirus 1 genomic sequence.|Organic Lake phycodnavirus 1|ENV|25-JUL-2016 acc|NEIGHBOR|GQ365650.1|HIV-1 isolate 05.BR.NSP24 from Brazil, complete genome.|Human immunodeficiency virus 1|VRL|20-DEC-2009 acc|NEIGHBOR|GQ365651.1|HIV-1 isolate 01.BR.RGS45 from Brazil, complete genome.|Human immunodeficiency virus 1|VRL|20-DEC-2009 acc|NEIGHBOR|GQ365652.1|HIV-1 isolate 01.BR.RGS69 from Brazil, complete genome.|Human immunodeficiency virus 1|VRL|20-DEC-2009 acc|NEIGHBOR|GQ368252.1|Avian adeno-associated virus isolate YZ-1, complete genome.|Avian adeno-associated virus|VRL|05-JAN-2011 acc|NEIGHBOR|GU111555.1|HIV-1 isolate RBF168 from France, complete genome.|Human immunodeficiency virus 1|VRL|24-JUL-2016 acc|GENBANK|GU385355.1|Equine infectious anemia virus isolate FDDV-2 tat (s1) and gag protein (gag) genes, complete cds; pol polyprotein (pol) gene, partial cds; and S2 (s2), truncated envelope polyprotein (env), and Rev (s3) genes, complete cds.|Equine infectious anemia virus|VRL|25-JUL-2016 acc|GENBANK|GU385356.1|Equine infectious anemia virus isolate FDDV-15-4 tat (s1) and gag protein (gag) genes, complete cds; pol polyprotein (pol) gene, partial cds; and S2 (s2) and truncated envelope polyprotein (env) genes, complete cds.|Equine infectious anemia virus|VRL|25-JUL-2016 acc|GENBANK|GU385357.1|Equine infectious anemia virus isolate FDDV-7 tat (s1) and gag protein (gag) genes, complete cds; pol polyprotein (pol) gene, partial cds; and S2 (s2), truncated envelope polyprotein (env), and Rev (s3) genes, complete cds.|Equine infectious anemia virus|VRL|25-JUL-2016

I am wondering if it is possible to contain the whole fasta header in ltrdigest_tabout.csv