Issue in KOFam database download #78

Closed minhtrung1997 closed 1 year ago

minhtrung1997 commented 1 year ago

Describe the bug when i run nextflow to download database, an issue happens


Command error:
  tar: profiles/K25306.hmm: Cannot change ownership to uid 500, gid 500: Operation not permitted
  tar: profiles/K25336.hmm: Cannot change ownership to uid 500, gid 500: Operation not permitted
  tar: profiles/K25338.hmm: Cannot change ownership to uid 500, gid 500: Operation not permitted

I have to download it by terminal, utilizing your script but skipp

chmod a+rw profiles.tar.gz ko_list


chown -R root:\$(id -g) profiles

If you want the log file for debug, please tell, I'll send it to your provided email Thanks

fmalmeida commented 1 year ago

Hi @minhtrung1997,

thanks for using it and reporting the problem.

can you please share with me the exact command you ran so I can try to reproduce it?

also, if you can send me the logs it will be of great help as I need to debug it.

thanks 😁

fmalmeida commented 1 year ago

The logs will be of help as I could not reproduce the error:


> nextflow run fmalmeida/bacannot -latest -profile docker --get_dbs --output new_dbs
N E X T F L O W  ~  version 22.04.5
Pulling fmalmeida/bacannot ...
Launching `` [magical_plateau] DSL2 - revision: 14919aead7 [master]

  fmalmeida/bacannot v3.2
Core Nextflow options
  revision       : master
  runName        : magical_plateau
  containerEngine: docker
  launchDir      : /home/falmeida/Documents
  workDir        : /home/falmeida/Documents/work
  projectDir     : /home/falmeida/.nextflow/assets/fmalmeida/bacannot
  userName       : falmeida
  profile        : docker
  configFiles    : /home/falmeida/.nextflow/assets/fmalmeida/bacannot/nextflow.config

Download databases options
  get_dbs        : true

Input/output options
  output         : new_dbs

!! Only displaying parameters that differ from the pipeline defaults !!
If you use fmalmeida/bacannot for your analysis please cite:

* The pipeline

* The nf-core framework

* Software dependencies
executor >  local (15)
[8f/676ed7] process > CREATE_DBS:PROKKA_DB        [100%] 1 of 1 βœ”
[5e/2b64de] process > CREATE_DBS:MLST_DB          [100%] 1 of 1 βœ”
[32/4e0c39] process > CREATE_DBS:KOFAMSCAN_DB     [100%] 1 of 1 βœ”
[5e/9c56dc] process > CREATE_DBS:CARD_DB          [100%] 1 of 1 βœ”
[c4/73c546] process > CREATE_DBS:RESFINDER_DB     [100%] 1 of 1 βœ”
[2d/f540f8] process > CREATE_DBS:AMRFINDER_DB     [100%] 1 of 1 βœ”
[37/6fef99] process > CREATE_DBS:ARGMINER_DB      [100%] 1 of 1 βœ”
[fa/e4b5d1] process > CREATE_DBS:PLATON_DB        [100%] 1 of 1 βœ”
[fe/c47052] process > CREATE_DBS:PLASMIDFINDER_DB [100%] 1 of 1 βœ”
[2f/759348] process > CREATE_DBS:PHIGARO_DB       [100%] 1 of 1 βœ”
[06/b7f6c3] process > CREATE_DBS:PHAST_DB         [100%] 1 of 1 βœ”
[ef/4b77d7] process > CREATE_DBS:VFDB_DB          [100%] 1 of 1 βœ”
[f3/044bf7] process > CREATE_DBS:VICTORS_DB       [100%] 1 of 1 βœ”
[e3/b05a37] process > CREATE_DBS:ICEBERG_DB       [100%] 1 of 1 βœ”
[22/947111] process > CREATE_DBS:ANTISMASH_DB     [100%] 1 of 1 βœ”

Completed at: 10-Jan-2023 12:06:13
Duration    : 2h 32m 40s
CPU hours   : 8.2
Succeeded   : 15
minhtrung1997 commented 1 year ago

Thanks, Unfortunately, I've lost the lof of that run, Anyway, there still 1 problem at the Resfinder step (when running test on ecoli sample) that may connected to the previous I reported

Error executing process > 'BACANNOT:RESFINDER (ecoli_2)'

Caused by:
  Process `BACANNOT:RESFINDER (ecoli_2)` terminated with an error exit status (1)

Command executed:

  # activate env
  source activate resfinder

  # Make databases available
  # ln -rs Database-bacannot/resfinder_db/db_* $(dirname $(which

  # Run resfinder acquired resistance \
      --inputfasta ecoli_2.fna -db_res Database-bacannot/resfinder_db/db_resfinder \
      -o resfinder \
      --species "Escherichia coli" \
      --min_cov  0.9 \
      --threshold 0.9 \
      --acquired ;

  # Fix name of pheno table
  mv resfinder/pheno_table.txt resfinder/args_pheno_table.txt &> /dev/null ;

  # Run resfinder pointfinder resistance \
      --inputfasta ecoli_2.fna -db_point Database-bacannot/resfinder_db/db_pointfinder \
      -o resfinder \
      --species "Escherichia coli" \
      --min_cov  0.9 \
      --threshold 0.9 \
      --point ;

  # Fix name of pheno table
  mv resfinder/pheno_table.txt resfinder/mutation_pheno_table.txt &> /dev/null ;

  # Convert to GFF \
      -i resfinder/ResFinder_results_tab.txt > resfinder/results_tab.gff ;

Command exit status:

Command output:

Command error:
  Could not locate ResFinder database path: /opt/conda/envs/resfinder/bin/db_resfinder

Work dir:

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

I've modified your code a bit because, like the KOFam above, The operation not permitted (log1.txt). Then when I specify the path and comment the symlink, it gotta error like log2.txt

log1.txt log2.txt

fmalmeida commented 1 year ago

Okay, so the problem seems to be appearing when using singularity. The only difficult on debugging it is that I have very limited access to machines with singularity but I will try my best.

If we just remove the symlink we would have to point the resfinder tool to another place just as you did with --db_res that would be the best option ... however, for some reason, the resfinder tool still tries to find it in its bin even tough we point to another place ... and there is nothing we can do about it unless it is fixed in the tool itself.

The symlink was the option I found that at least allowed it to run.

I will try to find a machine were I can run it with singularity so I can debug it.

To make sure, this error only appeared for Resfinder and KOfam, right?

minhtrung1997 commented 1 year ago

Till this point is yes :)). I've not finished running the test.

minhtrung1997 commented 1 year ago

I don't know if your image containing resfinder command singularity shell work/singularity/fmalmeida-bacannot-v3.2_misc.img image

fmalmeida commented 1 year ago

I know it is there because it is just a conversion from the Docker image. I just don't know how NF is actually mounting it so the tools remain available. Maybe there is something more than just calling shell.

I will try to get access to a machine with singularity by tomorrow so I can run it and debug. I will must probably be able to reproduce the error now that I know you are using the singularity profile and it is totally related to it because the docker profile is working good.

I will get this access and start debugging it. Maybe I will need to change the image a little bit or add some optional snippets for when using singularity. But, we will get this fixed :)

minhtrung1997 commented 1 year ago

I long to hear from you !!!

fmalmeida commented 1 year ago

Hi @minhtrung1997, Just to let you know that I am still trying to find a machine were I can try the singularity execution. Hopefully by next week I can have some update.

fmalmeida commented 1 year ago

Hi @minhtrung1997 , I am happy to say that I found a machine and could reproduce the errors. The error for resfinder just required a very small change in the available image.

I did the change and it is working, you can try to delete your singularity cache and images and run again so the pipeline fetches the new available image for that.

However, I still did not find time to debug the error with the database download. Since you already have the databases and can run the analysis, I will take care of that more calmly along this week :)

fmalmeida commented 1 year ago

Hi @minhtrung1997, I am struggling a bit to have this one fixed but am indeed looking at it. One thing that came to my mind is that maybe it also makes sense to provide the datases as a .tar.gz that can be downloaded from somewhere if users don't want to run it. Do you think it makes sense for it and would it be a good alternative for such cases? At least for singularity while this fix do not come up?

minhtrung1997 commented 1 year ago

I am really appreciate and thankful for the pipeline and the idea, also the way your team support issue. So I think, to me download dataset is a one-time job and provided a dataset in tar.gz is OK.

Anyway, I think it's better if we could incorporate MOB-suite and integron finder 2 to the collection, as those tools have really good review and also are recommended by our senior. If you team could help it, I long to hear notification from you. Once again, thank you very much❀

fmalmeida commented 1 year ago

Great to hear that. So, I will work on having this tar.gz and also I will create a few issues to have this planned and tackled. I will also tag you there so any desired input can be given. Integron finder 2.0 was already in the plans, see: also, I don't see any reason why MOB-suite could not be added, so I will also create a ticket for that.

The main problem here is, I don't have usually much time to do these enhancements so they may take a while ... But, if you feel comfortable enough to have a fork and help adding these tools, it will indeed be very much appreciated and welcomed.

It can take a while because it needs to address these steps:

But I will open the issues so at least these enhancements can start to happen.