BCCDC-PHL / tbprofiler-nf

Nextflow Wrapper for TBProfiler
4 stars 2 forks source link

tbprofiler error #2

Closed sherrie9 closed 1 year ago

sherrie9 commented 1 year ago

It is not an error of tbprofiler-nf. but some samples can cause tbprofiler to fail, for unclear reasons. Could we add ignore errors for those samples so the pipeline can run on the rest of the samples?

dfornika commented 1 year ago

Relevant error message is:

Running command:
set -u pipefail; bcftools query -u -f '%CHROM\t%POS\t%REF\t%ALT\t%ANN\t[%AD]\n' ./4c8ae5e4-e0cf-4a93-a565-e7ae6ab808ee.delly.csq.vcf.gz

Traceback (most recent call last):                                                                                                                                                                                 File "/home/dfornika/.conda/envs/tbprofiler-nf-a3a7045274879811c28a762d2c9a6c48/bin/tb-profiler", line 616, in <module>                                                                                            args.func(args)                                                                                                                                                                                                File "/home/dfornika/.conda/envs/tbprofiler-nf-a3a7045274879811c28a762d2c9a6c48/bin/tb-profiler", line 210, in main_profile                                                                                        results = tbp.reformat(results, conf, reporting_af=args.reporting_af, mutation_metadata=args.add_mutation_metadata)                                                                                            File "/home/dfornika/.conda/envs/tbprofiler-nf-a3a7045274879811c28a762d2c9a6c48/lib/python3.9/site-packages/tbprofiler/reformat.py", line 269, in reformat                                                         results["variants"] = select_csq(results["variants"])
    File "/home/dfornika/.conda/envs/tbprofiler-nf-a3a7045274879811c28a762d2c9a6c48/lib/python3.9/site-packages/tbprofiler/reformat.py", line 88, in select_csq
      csq = select_most_relevant_csq(d["consequences"])
    File "/home/dfornika/.conda/envs/tbprofiler-nf-a3a7045274879811c28a762d2c9a6c48/lib/python3.9/site-packages/tbprofiler/reformat.py", line 72, in select_most_relevant_csq
      ranked_csq.append([i for i,d in enumerate(rank) if d in csq["type"]][0])
  IndexError: list index out of range
  Cleaning up after failed run

  ################################# ERROR #######################################

  This run has failed. Please check all arguments and make sure all input files
  exist. If no solution is found, please open up an issue at
  https://github.com/jodyphelan/TBProfiler/issues/new and paste or attach the
  contents of the error log (SAMPLE_ID_REDACTED.errlog)

  ###############################################################################

It looks like the error is happening here:

https://github.com/jodyphelan/TBProfiler/blob/76c0b5d5579bef8de331401f9ad695f85f6819aa/tbprofiler/reformat.py#L72

...where there’s a list comprehension (statement like: [x for x in ....]), then an array access of the first element ([...][0])

[i for i,d in enumerate(rank) if d in csq["type"]][0]

...which is throwing an IndexError exception because that list comprehension is evaluating to an empty list, so there is no first element.

I’m looking at the latest version of TBprofiler, and it looks like the ‘reformat’ method has been rewritten:

https://github.com/jodyphelan/TBProfiler/blob/9c7ef32aecdc2e5c96e21b8dae4a91e5f1e694af/tbprofiler/reformat.py#L93-L111

...to use the ‘select_csq’ method from ‘pathogen-profiler’:

https://github.com/jodyphelan/pathogen-profiler/blob/f336b16bd5a3186d7532c5c3940d595a69e5dd09/pathogenprofiler/utils.py#L151-L182

...which calls this ‘select_most_relevant_csq’ method:

https://github.com/jodyphelan/pathogen-profiler/blob/f336b16bd5a3186d7532c5c3940d595a69e5dd09/pathogenprofiler/utils.py#L130-L141

...which seems to have been re-written compared to the version used in the TBProfiler that’s currently used in this pipeline.

So I think it would be worth trying to update TBProfiler to see if that fixes the issue.