filip-husnik / pseudofinder

Detection of pseudogene candidates in bacterial and archaeal genomes.
GNU General Public License v3.0
44 stars 16 forks source link

ValueError: invalid literal for int() with base 10: '>324591' #14

Closed sarah872 closed 4 years ago

sarah872 commented 4 years ago

Hi, I am getting an error for the annotation step. BlastP/X finished successfully.

2020-08-07 10:54:10 CDS extracted from:         TONE2019.1-Contigs.gbk
            Written to file:            TONE2019_cds.fasta.
2020-08-07 10:54:11 Proteome extracted from:        TONE2019.1-Contigs.gbk
            Written to file:            TONE2019_proteome.faa.
2020-08-07 10:54:12 Intergenic regions extracted from:  TONE2019.1-Contigs.gbk
            Written to file:            TONE2019_intergenic.fasta.
2020-08-07 10:54:12 blastp executed with 20 threads.
2020-08-10 21:53:07 blastx executed with 20 threads.
2020-08-12 04:45:12 Extracting information from blastp file.
Traceback (most recent call last):
  File "pseudo-finder/pseudofinder.py", line 30, in <module>
    annotate.main()
  File "/scratch/pseudogenes/pseudo-finder/modules/annotate.py", line 1096, in main
    orfs = parse_blast(fasta_file=file_dict['proteome_filename'], blast_file=file_dict['blastp_filename'], blast_format='blastp')
  File "/scratch/pseudogenes/pseudo-finder/modules/annotate.py", line 403, in parse_blast
    'end': int(fields_in_line[3]),
ValueError: invalid literal for int() with base 10: '>324591'
End time : 04:45:14 Wed Aug 12 04:45:14 CEST 2020
mitchso commented 4 years ago

Hi Sarah,

Can you pull the most recent updates and try again?

Thanks, Mitch

sarah872 commented 4 years ago

I ran into another error. Specifying the file for the log-file doesn't work:

python3 pseudo-finder/pseudofinder.py reannotate -g mygenome.gbk -p blastP_output.tsv -x blastX_output.tsv -log log -op reannotate

Traceback (most recent call last):
  File "pseudo-finder/pseudofinder.py", line 32, in <module>
    reannotate.main()
  File "/scratch/pseudogenes/pseudo-finder/modules/reannotate.py", line 82, in main
    logged_args = parse_log(command_line_args.logfile)
  File "/scratch/pseudogenes/pseudo-finder/modules/reannotate.py", line 11, in parse_log
    with open(logfile, 'r') as log:
FileNotFoundError: [Errno 2] No such file or directory: 'log'
mitchso commented 4 years ago

Can you try with the test dataset and let me know if that works for you? A file not found error suggests to me that there might be a file pathing issue on your end but I can't tell without more info.

sarah872 commented 4 years ago

Oh so the --logfile should point to the log file in the (first) annotate run? The thing is that the annotate threw an error, so I don't get where that file should come from... Or do I have to do the blastp and blastx searches again, ie. start all over again?

mitchso commented 4 years ago

Yes the reannotate command needs access to the log from the previous run so that it can parse it for all the parameters used and files generated.

If the annotate command is throwing and error for you, it is likely that reannotate will also throw an error, so I am wondering whether you can run the test command without errors, and/or if you can run the annotate command without errors?

Try this: python3 pseudofinder.py test --database /PATH/TO/NR/nr

If the computational time is an issue, you can create a very small custom database using blast makedb and use that for troubleshooting. If you are able to complete a run with the small database, then you can be confident that a longer run using a large database will also complete without errors.