Closed josefdiaz closed 1 year ago
Hi @josefdiaz , Based on your error message, it seems that I am failing to collect the results of plasmidfinder. Maybe it is empty, or maybe it has more than one database for the given sample.
Would it be possible for you to create a tar file of this working directory and send it to me so I can fix the python script? I would suggest a command like this:
tar zcvfh error_dir_workdir.tar.gz /home/jfrancisco/work/e1/96ee86a42df4c84ef36c2bc0c22b82
.
Thanks.
Hi, I send you the tar file so you can fix it when you can. Thanks a lot. error_dir_workdir.tar.gz
Thanks. I will get back to you when I have some updates or some other questions/requests in order to solve it.
Hi @josefdiaz ,
Somewhere in your output directory, or in the home directory where you launched the pipeline, you should have some nextflow report files like execution report HTML, or trace file txt file in which you can see the working directory of each task.
Can you try finding in these files, the working directory where plasmidFinder was executed for the sample_160
and send this working directory to me, or just the .command.*
and log files from this task (plasmidfinder -- sample_160).
I was already able to reproduce, find and fix the issue described, however, I need to proper understand what the tool executed or generated so I know whether the fix I found is correct or not.
Thanks.
Actually, this is not needed. I just checked the tools webpage, and it indeed seem to make sense. The multiple databases shown are because the gram-positive databases from the tool are split in more than one, thus the error.
I will push the changes to a new py-script version and build a new docker image that shall come for next release with new features and this bugfix.
Once the image is ready to be used (probably tomorrow), I will write here again the command you can use to try it so we can be sure it was fixed before I make the release. ( We will perform a run with only sample_160
) since was the one that in this case triggered the error.
😉
Hi @josefdiaz ,
I have pushed the docker image now trying to solve this in a new branch(new pipeline release). Can you try running this version of the pipeline on your samples with -r 96-error-summary -latest -resume
? Would be something like this:
# make sure to download the latest docker image even if already having one locally
# since the code is being fixed in this image for the next version, if already locally,
# nextflow will not pull the latest
docker pull fmalmeida/bacannot:v3.3_misc
# run the pipeline with resume
nextflow run fmalmeida/bacannot \
-r 96-error-summary \
-latest \
-resume \
--input bacannot_samplesheet.yaml \
--output Resultadosilvestres \
--bacannot_db ./bacannot_db_2023 \
--max_memory 100.GB \
-profile docker
Hi @josefdiaz ,
I have pushed the docker image now trying to solve this in a new branch(new pipeline release). Can you try running this version of the pipeline on your samples with
-r 96-error-summary -latest -resume
? Would be something like this:# make sure to download the latest docker image even if already having one locally # since the code is being fixed in this image for the next version, if already locally, # nextflow will not pull the latest docker pull fmalmeida/bacannot:v3.3_misc # run the pipeline with resume nextflow run fmalmeida/bacannot \ -r 96-error-summary \ -latest \ -resume \ --input bacannot_samplesheet.yaml \ --output Resultadosilvestres \ --bacannot_db ./bacannot_db_2023 \ --max_memory 100.GB \ -profile docker
Hi @fmalmeida , First of all, Thank you very much for all the assistance you are providing. I have executed this version of the pipeline, and it demonstrates improved performance. However, it still encounters a failure during the summary step (66 of 73). Additionally, the sample_summary.json file does not contain the Mobile Genetic Element (MGE) summary; it appears with no information, even though the different tools detect them. Below, I provide the command used to launch the pipeline and the specific error message received. "Launching"
executor > local (2240)
[- ] process > BACANNOT:UNICYCLER -
[- ] process > BACANNOT:FLYE -
[46/d8b45b] process > BACANNOT:PROKKA (sample_2340) [100%] 73 of 73 ✔
[a2/7b2eaf] process > BACANNOT:MLST (sample_2340) [100%] 73 of 73 ✔
[a1/89c372] process > BACANNOT:BARRNAP (sample_2340) [100%] 73 of 73 ✔
[f5/a3a95c] process > BACANNOT:COMPUTE_GC (sample_2340) [100%] 73 of 73 ✔
[71/ef2d53] process > BACANNOT:KOFAMSCAN (sample_2340) [100%] 73 of 73 ✔
[fa/e473bc] process > BACANNOT:KEGG_DECODER (sample_2340) [100%] 73 of 73 ✔
[73/5303af] process > BACANNOT:PLASMIDFINDER (sample_2340) [100%] 73 of 73 ✔
[41/befa01] process > BACANNOT:PLATON (sample_2340) [100%] 73 of 73 ✔
[7d/6d3087] process > BACANNOT:MOBSUITE (sample_2340) [100%] 73 of 73 ✔
[78/f06a12] process > BACANNOT:ISLANDPATH (sample_2340) [100%] 73 of 73 ✔
[76/d605b3] process > BACANNOT:INTEGRON_FINDER (sample_2340) [100%] 73 of 73 ✔
[- ] process > BACANNOT:INTEGRON_FINDER_2GFF -
[fa/fcd77c] process > BACANNOT:VFDB (sample_2340) [100%] 73 of 73 ✔
[9f/11b75a] process > BACANNOT:VICTORS (sample_2340) [100%] 73 of 73 ✔
[5f/861335] process > BACANNOT:PHAST (sample_2340) [100%] 73 of 73 ✔
[20/3e3576] process > BACANNOT:PHIGARO (sample_2340) [100%] 73 of 73 ✔
[cf/6f7c80] process > BACANNOT:PHISPY (sample_2340) [100%] 73 of 73 ✔
[32/19f100] process > BACANNOT:ICEBERG (sample_2340) [100%] 73 of 73 ✔
[b9/70bfc1] process > BACANNOT:AMRFINDER (sample_2340) [100%] 73 of 73 ✔
[0b/159fd0] process > BACANNOT:CARD_RGI (sample_2340) [100%] 73 of 73 ✔
[e1/aa2279] process > BACANNOT:ARGMINER (sample_2340) [100%] 73 of 73 ✔
[- ] process > BACANNOT:RESFINDER -
[- ] process > BACANNOT:CALL_METHYLATION -
[b3/94b31a] process > BACANNOT:REFSEQ_MASHER (sample_2340) [100%] 73 of 73 ✔
[dc/5fd44a] process > BACANNOT:DIGIS (sample_2231) [100%] 73 of 73 ✔
[ca/f2e764] process > BACANNOT:ANTISMASH (sample_2340) [100%] 73 of 73 ✔
[29/c22dc4] process > BACANNOT:SEQUENCESERVER (sample_2340) [100%] 73 of 73 ✔
[ff/30466a] process > BACANNOT:MERGE_ANNOTATIONS (sample_2340) [100%] 73 of 73 ✔
[ee/4eda6f] process > BACANNOT:DRAW_GIS (sample_2340) [100%] 73 of 73 ✔
[17/7de969] process > BACANNOT:GFF2GBK (sample_320) [100%] 73 of 73 ✔
[aa/dc59c5] process > BACANNOT:CREATE_SQL (sample_2340) [100%] 73 of 73 ✔
[a9/2352f8] process > BACANNOT:JBROWSE (sample_356) [ 78%] 57 of 73
[- ] process > BACANNOT:REPORT [ 0%] 0 of 73
[a0/cc6cf8] process > BACANNOT:SUMMARY (sample_2204) [ 90%] 66 of 73, failed: 1
[- ] process > BACANNOT:MERGE_SUMMARIES -
[59/1d2d0e] process > BACANNOT:CIRCOS (sample_2340) [100%] 73 of 73 ✔
Execution cancelled -- Finishing pending tasks before exit
ERROR ~ Error executing process > 'BACANNOT:SUMMARY (sample_2114)'
Caused by:
Process `BACANNOT:SUMMARY (sample_2114)` terminated with an error exit status (1)
Command executed:
mkdir -p results/sample_2114/annotation
ln -rs annotation/* results/sample_2114/annotation
sed -i 's/s:/:/g' results/sample_2114/annotation/sample_2114.txt
falmeida-py bacannot2json -i results -o sample_2114_summary.json
Command exit status:
1
Command output:
(empty)
Command error:
Traceback (most recent call last):
File "/opt/conda/bin/falmeida-py", line 33, in <module>
sys.exit(load_entry_point('falmeida-py==1.2.1', 'console_scripts', 'falmeida-py')())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/falmeida_py/__main__.py", line 212, in main
bacannot2json(args['--input'], args['--output'], args['--print'])
File "/opt/conda/lib/python3.11/site-packages/falmeida_py/bacannot2json.py", line 111, in bacannot2json
plasmids_stats( bacannot_summary )
File "/opt/conda/lib/python3.11/site-packages/falmeida_py/plasmid_function.py", line 28, in plasmids_stats
results = pd.read_csv(
^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 912, in read_csv
return _read(filepath_or_buffer, kwds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 577, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 1407, in __init__
self._engine = self._make_engine(f, self.engine)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/pandas/io/parsers/readers.py", line 1679, in _make_engine
return mapping[engine](f, **self.options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 93, in __init__
self._reader = parsers.TextReader(src, **kwds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "pandas/_libs/parsers.pyx", line 557, in pandas._libs.parsers.TextReader.__cinit__
pandas.errors.EmptyDataError: No columns to parse from file
Work dir:
/home/jfrancisco/work/5c/a68fe4233b293ad265ee6932bd0c42
Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`
-- Check '.nextflow.log' file for details
Files: error_dir_workdir21.tar.gz
Hi @josefdiaz ,
Interesting. I will take a look at the error&data provided, and I will get back to you early next week.
Just one note for you, you can see that even though you used -resume
the modules were not cached, this happened because we are now using a new branch with a new version of the code. Next time, when you run again this same branch with -resume
we will see modules being cached.
Interestingly, I also saw from your message that the module INTEGRON_FINDER_2GFF
was not executed when it should ... So, I will take a look at the issue so you can try it out next week and, hopefully, have it successfully.
Let's work to have it done 😄
Hi @josefdiaz ,
I have updated the python-code and the docker image again to try to fix this issue.
Also, a few other things as they relate to the questions or 'blank' itens:
INTEGRON_FINDER_2GFF
is working good, it is just not executed, when the INTEGRON_FINDER
module does not produce a gbk file, meaning no integron was detected.MGEs
dict key, only contains the integron finder resuts
, since in your case, the tool did not find anything as we can see that the INTEGRON_FINDER_2GFF
as not executed, the dict is empty.prophages
and ICEs
are added to the summary in this MGE
section, however, I still did not find to do so, so they are still not there ( I am working on that ).Could you then try running this again, pulling the docker image once more to guarantee having the latest python-code?
# make sure to download the latest docker image even if already having one locally
# since the code is being fixed in this image for the next version, if already locally,
# nextflow will not pull the latest
docker pull fmalmeida/bacannot:v3.3_misc
# run the pipeline with resume
nextflow run fmalmeida/bacannot \
-r 96-error-summary \
-latest \
-resume \
--input bacannot_samplesheet.yaml \
--output Resultadosilvestres \
--bacannot_db ./bacannot_db_2023 \
--max_memory 100.GB \
-profile docker
Ps. Just a small note, if it works, and we manage to fix the issue, I will wrap-up what was done to do a quick bugfix patch release in the current master branch, producing v3.2.1, instead of already releasing this
dev
branch with v3.3. This, because, the plan for next v3.3 release with the newly added features likeMOB_SUITE
andINTEGRON_FINDER
is planned to be released in August/September, as there are still a few things to finish as described here #88 .So, depending on how 'early' you will need to publish or share the results of your work, maybe would worth re-analysing with the bugfix patch release v3.2.1 or even analyse again with the stable v3.3 release once released. I say this, just so you can have a stable, fixed release, so you can guarantee reproducibility 😄
Hi @josefdiaz ,
I have updated the python-code and the docker image again to try to fix this issue.
Also, a few other things as they relate to the questions or 'blank' itens:
- I tried a few other genomes and
INTEGRON_FINDER_2GFF
is working good, it is just not executed, when theINTEGRON_FINDER
module does not produce a gbk file, meaning no integron was detected.- Currently, the
MGEs
dict key, only contains theintegron finder resuts
, since in your case, the tool did not find anything as we can see that theINTEGRON_FINDER_2GFF
as not executed, the dict is empty.- Finally, I am still working on this dev branch, so that information on
prophages
andICEs
are added to the summary in thisMGE
section, however, I still did not find to do so, so they are still not there ( I am working on that ).- Then, I moved the dict key generations to the 'child-funtions' so that, if the files used to populate are not available, like the integron finder gffs, the dict key is not created, thus instead of being empty, it is not added.
Could you then try running this again, pulling the docker image once more to guarantee having the latest python-code?
# make sure to download the latest docker image even if already having one locally # since the code is being fixed in this image for the next version, if already locally, # nextflow will not pull the latest docker pull fmalmeida/bacannot:v3.3_misc # run the pipeline with resume nextflow run fmalmeida/bacannot \ -r 96-error-summary \ -latest \ -resume \ --input bacannot_samplesheet.yaml \ --output Resultadosilvestres \ --bacannot_db ./bacannot_db_2023 \ --max_memory 100.GB \ -profile docker
Ps. Just a small note, if it works, and we manage to fix the issue, I will wrap-up what was done to do a quick bugfix patch release in the current master branch, producing v3.2.1, instead of already releasing this
dev
branch with v3.3. This, because, the plan for next v3.3 release with the newly added features likeMOB_SUITE
andINTEGRON_FINDER
is planned to be released in August/September, as there are still a few things to finish as described here #88 . So, depending on how 'early' you will need to publish or share the results of your work, maybe would worth re-analysing with the bugfix patch release v3.2.1 or even analyse again with the stable v3.3 release once released. I say this, just so you can have a stable, fixed release, so you can guarantee reproducibility smile
Hi @fmalmeida , I have tried running that again pulling the docker image and it has gone well.
# run the pipeline
nextflow run fmalmeida/bacannot \
-r 96-error-summary \
-latest \
-resume \
--input bacannot_samplesheet.yaml \
--output Resultadosilvestres24jul \
--bacannot_db ./bacannot_db_2023 \
--max_memory 100.GB \
-profile docker
N E X T F L O W ~ version 23.04.1
Pulling fmalmeida/bacannot ...
Fast-forward
Launching `https://github.com/fmalmeida/bacannot` [infallible_austin] DSL2 - revision: f411535503 [96-error-summary]
------------------------------------------------------
fmalmeida/bacannot v3.3
------------------------------------------------------
Core Nextflow options
revision : 96-error-summary
runName : infallible_austin
containerEngine : docker
launchDir : /home/jfrancisco
workDir : /home/jfrancisco/work
projectDir : /home/jfrancisco/.nextflow/assets/fmalmeida/bacannot
userName : jfrancisco
profile : docker
configFiles : /home/jfrancisco/.nextflow/assets/fmalmeida/bacannot/nextflow.config
Input/output options
input : bacannot_samplesheet.yaml
output : Resultadosilvestres24jul
bacannot_db : ./bacannot_db_2023
Max job request options
max_memory : 100.GB
Generic options
unicycler_version: 0.5.0--py310h6cc9453_3
!! Only displaying parameters that differ from the pipeline defaults !!
------------------------------------------------------
If you use fmalmeida/bacannot for your analysis please cite:
* The pipeline
https://doi.org/10.5281/zenodo.3627669
* The nf-core framework
https://doi.org/10.1038/s41587-020-0439-x
* Software dependencies
https://github.com/fmalmeida/bacannot/blob/master/CITATIONS.md
------------------------------------------------------
executor > local (2337)
[- ] process > BACANNOT:UNICYCLER -
[- ] process > BACANNOT:FLYE -
[32/8faee8] process > BACANNOT:PROKKA (sample_2340) [100%] 73 of 73 ✔
[e7/4d92f5] process > BACANNOT:MLST (sample_2340) [100%] 73 of 73 ✔
[9d/88df36] process > BACANNOT:BARRNAP (sample_2340) [100%] 73 of 73 ✔
[61/cff46e] process > BACANNOT:COMPUTE_GC (sample_2340) [100%] 73 of 73 ✔
[23/67cd60] process > BACANNOT:KOFAMSCAN (sample_2340) [100%] 73 of 73 ✔
[52/8a2871] process > BACANNOT:KEGG_DECODER (sample_2340) [100%] 73 of 73 ✔
[ed/0f81eb] process > BACANNOT:PLASMIDFINDER (sample_2340) [100%] 73 of 73 ✔
[78/ce10b1] process > BACANNOT:PLATON (sample_2340) [100%] 73 of 73 ✔
[50/3912a7] process > BACANNOT:MOBSUITE (sample_2340) [100%] 73 of 73 ✔
[10/f70c52] process > BACANNOT:ISLANDPATH (sample_2340) [100%] 73 of 73 ✔
[46/7608bf] process > BACANNOT:INTEGRON_FINDER (sample_2340) [100%] 73 of 73 ✔
[- ] process > BACANNOT:INTEGRON_FINDER_2GFF -
[d4/8c0439] process > BACANNOT:VFDB (sample_2340) [100%] 73 of 73 ✔
[e9/c2f23f] process > BACANNOT:VICTORS (sample_2340) [100%] 73 of 73 ✔
[0c/bd7efc] process > BACANNOT:PHAST (sample_2340) [100%] 73 of 73 ✔
[e2/fc7328] process > BACANNOT:PHIGARO (sample_2340) [100%] 73 of 73 ✔
[3b/7e7d96] process > BACANNOT:PHISPY (sample_2340) [100%] 73 of 73 ✔
[df/207fa5] process > BACANNOT:ICEBERG (sample_2340) [100%] 73 of 73 ✔
[87/5a54c9] process > BACANNOT:AMRFINDER (sample_2340) [100%] 73 of 73 ✔
[1c/a3c21a] process > BACANNOT:CARD_RGI (sample_2340) [100%] 73 of 73 ✔
[8f/8b5f4e] process > BACANNOT:ARGMINER (sample_2340) [100%] 73 of 73 ✔
[- ] process > BACANNOT:RESFINDER -
[- ] process > BACANNOT:CALL_METHYLATION -
[07/5af66a] process > BACANNOT:REFSEQ_MASHER (sample_2340) [100%] 73 of 73 ✔
[17/8fb549] process > BACANNOT:DIGIS (sample_2340) [100%] 73 of 73 ✔
[d0/246d83] process > BACANNOT:ANTISMASH (sample_2340) [100%] 73 of 73 ✔
[4d/b3fdea] process > BACANNOT:SEQUENCESERVER (sample_2340) [100%] 73 of 73 ✔
[cb/388b64] process > BACANNOT:MERGE_ANNOTATIONS (sample_2204) [100%] 73 of 73 ✔
[a6/cb7a55] process > BACANNOT:DRAW_GIS (sample_2204) [100%] 73 of 73 ✔
[79/9c0d9f] process > BACANNOT:GFF2GBK (sample_320) [100%] 73 of 73 ✔
[fe/dba607] process > BACANNOT:CREATE_SQL (sample_2204) [100%] 73 of 73 ✔
[70/89c441] process > BACANNOT:JBROWSE (sample_2214) [100%] 73 of 73 ✔
[d5/7d192f] process > BACANNOT:REPORT (sample_2204) [100%] 73 of 73 ✔
[e7/2a56bf] process > BACANNOT:SUMMARY (sample_2204) [100%] 73 of 73 ✔
[29/8039e0] process > BACANNOT:MERGE_SUMMARIES [100%] 1 of 1 ✔
[ae/7f9dce] process > BACANNOT:CIRCOS (sample_2204) [100%] 73 of 73 ✔
Completed at: 24-Jul-2023 19:53:03
Duration : 6h 5m 48s
CPU hours : 144.3
Succeeded : 2'337
Thanks for all your supports
Hi @josefdiaz ,
Thanks for sharing. As I discussed on my last comment, I will then wrap this ticket up to create a bugfix release of the docker image on the current version, and then, I will close this ticket once this issue it is released.
Thanks for reporting it.
I am closing this issue know as the docker image was just updated, meaning that the problem would not happen using the version v3.2
if using the latest docker images of the pipeline ( updated today ).
A simple docker pull fmalmeida/bacannot:v3.2_pyenv
would fix v3.2
if encountering the same issue described here.
Thus, out of this ticket, two other children-issue were created:
v3.3
)
Hi Felipe, Could you help with this issue I've got when running bacannot everything went just fine except the pipeline arrives that step. I tried twice with differents genomes and it fails at the same point, SUMMARY.
Command line $ nextflow run fmalmeida/bacannot --input bacannot_samplesheet.yaml --output Resultadosilvestres --bacannot_db ./bacannot_db_2023 --max_memory 100.GB -profile docker
Launching
Thanks for all.