fmalmeida / bacannot

Generic but comprehensive pipeline for prokaryotic genome annotation and interrogation with interactive reports and shiny app.
https://bacannot.readthedocs.io/en/latest/
GNU General Public License v3.0
96 stars 9 forks source link

Error on SUMMARY step when missing annotation/id.txt key #111

Closed mladen5000 closed 8 months ago

mladen5000 commented 8 months ago

Describe the bug A missing key in annotations .txt will cause the bacannot2json code to break.

I think this can be resolved by using .get in the python script or by modifying the file to contain "rRNAs: 0" (similarly the error occurs with tmRNA occasionally as well.

Caused by:
  Process `BACANNOT:SUMMARY (MAG01)` terminated with an error exit status (1)

Command executed:

  mkdir -p results/MAG01/annotation
  ln -rs annotation/* results/MAG01/annotation
  sed -i 's/s:/:/g' results/MAG01/annotation/MAG01.txt
  falmeida-py bacannot2json -i results -o MAG01_summary.json

Command exit status:
  1

Command output:
  (empty)

Command error:
  Traceback (most recent call last):
    File "/opt/conda/bin/falmeida-py", line 8, in <module>
      sys.exit(main())
    File "/opt/conda/lib/python3.9/site-packages/falmeida_py/__main__.py", line 212, in main
      bacannot2json(args['--input'], args['--output'], args['--print'])
    File "/opt/conda/lib/python3.9/site-packages/falmeida_py/bacannot2json.py", line 102, in bacannot2json
      general_stats( bacannot_summary )
    File "/opt/conda/lib/python3.9/site-packages/falmeida_py/general_stats_function.py", line 41, in general_stats
      bacannot_summary[sample]['general_annotation']['rrna']  = general_results['rRNA']
  KeyError: 'rRNA'

Work dir:
  /home/mladen/work/19/975e38ad1c54db928ded079d48ccdd

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

 -- Check '.nextflow.log' file for details

Desktop (please complete the following information):

Smartphone (please complete the following information):

Additional context Add any other context about the problem here.

fmalmeida commented 8 months ago

Hi @mladen5000,

Thanks for reporting this. I will give it a try as soon as I get back from vacation.

Although simple to solve, is not something I can do right now.

Does this error happen when you use Prokka or Bakta as the generic annotation tool? Could you try running with the other, just to guarantee if it happens for both anotators?

Otherwise, you would need to run the pipeline skipping this process or ignoring this error right now so you could use the rest of the outputs until I fix the python script.

I believe the .get solution you suggested might be the best option.

fmalmeida commented 8 months ago

Hello hello @mladen5000 , I have added a possible fix in a new developmental branch. Can you give it a try? Since it is a MAG, and not a genome, maybe other keys shall also fail.

Would be basically the same command, you would just need to add the -r parameter:

nextflow run fmalmeida/bacannot -r 111-error-on-summary-step-when-missing-annotationidtxt-key <all your other parameters>
mladen5000 commented 8 months ago

It works now, thanks!

fmalmeida commented 8 months ago

Awesome. Will work on wrapping it up as a patch release and close the issue by then.

Thanks.

fmalmeida commented 8 months ago

Hi @mladen5000 , I have released a new version of the python package that I use for the summary and updated the docker image. So I can merge this issue for a new release, can you run it once more, using -latest so I am sure it runs?

nextflow \
    run fmalmeida/bacannot \
    -r 111-error-on-summary-step-when-missing-annotationidtxt-key \
    -latest \
    <all your other parameters>
mladen5000 commented 8 months ago

Ran to completion and output looks good

fmalmeida commented 8 months ago

Thanks for confirmation. v3.3.1 released