dputhier / pygtftk

A python package and a set of shell commands to handle GTF files
GNU General Public License v3.0
45 stars 6 forks source link

ERROR-get_tx_seq : No genes were found on chromosomes defined in fasta file. #180

Open yaskermezli opened 1 year ago

yaskermezli commented 1 year ago

Hi,

I'm using the get_tx_seq function from gtftk and I'm getting this error

Here the header of my genome_fasta.fa file :

1 dna:chromosome chromosome:GRCm38:1:1:195471971:1 REF NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

to test that I deleted headers after chr [dna:chromosome chromosome:GRCm38:1:1:195471971:1 REF] but still getting the same error

Any help/suggestion ?

Thank you

dputhier commented 1 year ago

Hi Yasmina, Yes. First, there is this issue related to " dna:chromosome chromosome:GRCm38:1:1:195471971:1 REF" which has not been fixed yet. This is a known issue (#171). But, yes, deleting every character after "1" should work... Maybe the problem is related to the fact that (at least in your example), the line does not start with ">". Best

dputhier commented 1 year ago

Also, you forgot to paste the error message. Denis

yaskermezli commented 1 year ago

oh yes sorry,

1- my lines start by > (I did'nt paste it) :

1 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

2- the error message is the same in the subject :

|-- 16:50-ERROR-get_tx_seq : No genes were found on chromosomes defined in fasta file.

dputhier commented 1 year ago

Ah yes, this error is handled. Maybe this is a story about a file having chr prefix not found in the other file ? Denis

Le ven. 31 mars 2023 à 16:51, Yasmina Kermezli @.***> a écrit :

oh yes sorry,

1- my lines start by > (I did'nt paste it) :

1 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

2- the error message is the same in the subject :

|-- 16:50-ERROR-get_tx_seq : No genes were found on chromosomes defined in fasta file.

— Reply to this email directly, view it on GitHub https://github.com/dputhier/pygtftk/issues/180#issuecomment-1492047276, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN7CHVA74XJGST7KO2SJP3W63VPDANCNFSM6AAAAAAWOXEN5M . You are receiving this because you commented.Message ID: @.***>

yaskermezli commented 1 year ago

Yes I found it The probleme was in some unlocalized genomic contigs (like >GL456211.1 ...) I deleted them and it works now

Thanks Denis!

dputhier commented 1 year ago

I will reopen it as it should not complain for additional chromose in the fasta file (but for additional chromosome in the gtf file).