MaestSi / MetONTIIME

A Meta-barcoding pipeline for analysing ONT data in QIIME2 framework
GNU General Public License v3.0
78 stars 17 forks source link

ERROR: process > diversityAnalyses #64

Closed nbel15 closed 1 year ago

nbel15 commented 1 year ago

Hi, I am trying MetONTIIME for the analysis of full 16S ONT sequences, everything was running smoothly until I reached the diversity analysis step where I got the error below ( please find the full log file in the attachment):

(1/1) Invalid value for '--i-table':
    /home/naima/software/MetONTIIME/output/collapseTables/table-collapsed-
    absfreq-level6.qza does not exist.

In the "collapseTables" folder I found only a single file "...-level1.qza". I am wondering if I should have obtained multiple files for each taxa level. I would appreciate it if you could help me to solve this issue.

nextflow.log

MaestSi commented 1 year ago

Hi, yes, as you said the problem occurred in the collapseTables step, since you should have obtained tables collapsed at levels from 1 to 7 (I saw you are using Silva database). At line 331 in metontiime2.nf there is the following command: num_levels=\$(echo \$(cat ${params.dbTaxonomyTsv} | head -n2 | tail -n1 | cut -f2 | grep -n -o \";\" | wc -l) + 1 | bc) I guess for some reasons, it could detect only one level. Could you please look at the .command.sh script in the work dir corresponding to collapseTables process? I would like to see what is the exact command that was executed for inferring the number of levels in the db. Can you also confirm that in taxonomy.tsv files each level is separated with ';'? Thanks, SM

nbel15 commented 1 year ago

Hi,

Thanks for your prompt reply.

The mentioned command num_levels=\$(echo \$(cat ${params.dbTaxonomyTsv}........ works fine. I ran the collapseTables script separately (.command.sh) without any modification using qiime2-2023.5 and I got the files for all the taxa levels without any issues. Somehow, the output folder and subfolders are created with root ownership, thus, I had to change them to save the results of the collapseTables script into collapseTabels folder

However, it seems that the same script with the same input files is not running properly within the pipeline. Please find below the content of the log file at this step:

cat: /path-to/SILVA_Qiime2/SILVA_138_99/taxonomy.tsv: No such file or directory
Saved FeatureTable[Frequency] to: /home/naima/software/MetONTIIME/output3/collapseTables/table-collapsed-absfreq-level1.qza

I tried also to remove diversityanalysis process, and I got the same error message but this time I found all the output files of only level 1. Below is the content of the log file:

cat: /path-to//SILVA_Qiime2/SILVA_138_99/taxonomy.tsv: No such file or directory
Saved FeatureTable[Frequency] to: /home/naima/software/MetONTIIME/output3/collapseTables/table-collapsed-absfreq-level1.qza
Saved Visualization to: /home/naima/software/MetONTIIME/output3/collapseTables/table-collapsed-absfreq-level1.qzv
Exported /home/naima/software/MetONTIIME/output3/collapseTables/table-collapsed-absfreq-level1.qza as BIOMV210DirFmt to directory /home/naima/software/MetONTIIME/output3/collapseTables/
Saved FeatureTable[RelativeFrequency] to: /home/naima/software/MetONTIIME/output3/collapseTables/table-collapsed-relfreq-level1.qza
Saved Visualization to: /home/naima/software/MetONTIIME/output3/collapseTables/table-collapsed-relfreq-level1.qzv
Exported /home/naima/software/MetONTIIME/output3/collapseTables/table-collapsed-relfreq-level1.qza as BIOMV210DirFmt to directory /home/naima/software/MetONTIIME/output3/collapseTables/

I would like to mention that the SILVA files are fine and they are in the correct paths and permissions.

Thank you for your help and assistance. N.B.

MaestSi commented 1 year ago

Hi, I just noticed there was a bug in the pipeline, causing diversityAnalysis process to start before collapseTables completed. I just updated metontiime2.nf file. Please download the new file and restart the pipeline (you can add -resume flag to the command line). Let me know if this fixes the issue. SM

nbel15 commented 1 year ago

Hi,

The pipeline is still unable of iterating through the taxonomic levels, I tried to select only one level using "--taxaOfInterest " option but it seems that it is following the same process.

Please find the log attached.

nextflow.log

MaestSi commented 1 year ago

Hi, can you please run: cat /home/naima/software/MetONTIIME/work/4f/cf3b828c7a88542818f0e8d08910d8/.command.sh and report here the output? Thanks, SM

nbel15 commented 1 year ago

Hi,

Please find the output in the attachment cat_collapse.txt

This script works fine when I run it separately from the pipeline:

cd path-to/work/4f/cf3b828c7a88542818f0e8d08910d8/
./.command.sh

The output is a couple of files from level 1 to 7:

Saved FeatureTable[Frequency] to: /home/naima/software/MetONTIIME/output3/collapseTables/table-collapsed-absfreq-level1.qza
Saved Visualization to: /home/naima/software/MetONTIIME/output3/collapseTables/table-collapsed-absfreq-level1.qzv
Exported /home/naima/software/MetONTIIME/output3/collapseTables/table-collapsed-absfreq-level1.qza as BIOMV210DirFmt to directory /home/naima/software/MetONTIIME/output3/collapseTables/
Saved FeatureTable[RelativeFrequency] to: /home/naima/software/MetONTIIME/output3/collapseTables/table-collapsed-relfreq-level1.qza
Saved Visualization to: /home/naima/software/MetONTIIME/output3/collapseTables/table-collapsed-relfreq-level1.qzv
Exported /home/naima/software/MetONTIIME/output3/collapseTables/table-collapsed-relfreq-level1.qza as BIOMV210DirFmt to directory /home/naima/software/MetONTIIME/output3/collapseTables/
Saved FeatureTable[Frequency] to: /home/naima/software/MetONTIIME/output3/collapseTables/table-collapsed-absfreq-level2.qza
Saved Visualization to: /home/naima/software/MetONTIIME/output3/collapseTables/table-collapsed-absfreq-level2.qzv
Exported /home/naima/software/MetONTIIME/output3/collapseTables/table-collapsed-absfreq-level2.qza as BIOMV210DirFmt to directory /home/naima/software/MetONTIIME/output3/collapseTables/
Saved FeatureTable[RelativeFrequency] to: /home/naima/software/MetONTIIME/output3/collapseTables/table-collapsed-relfreq-level2.qza
Saved Visualization to: /home/naima/software/MetONTIIME/output3/collapseTables/table-collapsed-relfreq-level2.qzv
Exported /home/naima/software/MetONTIIME/output3/collapseTables/table-collapsed-relfreq-level2.qza as BIOMV210DirFmt to directory /home/naima/software/MetONTIIME/output3/collapseTables/
........ etc.
MaestSi commented 1 year ago

Yesterday I realised that diversityAnalysis process was erroneously not supposed to wait for collapseTables to finish before it could start. Did you download the updated version of metontiime2.nf script? Moreover, what are the messages printed by the pipeline?

cat /home/naima/software/MetONTIIME/work/4f/cf3b828c7a88542818f0e8d08910d8/.command.out

cat /home/naima/software/MetONTIIME/work/4f/cf3b828c7a88542818f0e8d08910d8/.command.err

Thanks, SM

nbel15 commented 1 year ago

Hi, I apologize for the delayed reply. I am using the new metontiime2.nf file and still getting the same issue. However, I managed to overcome this by using the NCBI 16S database instead of SILVA database.

Thanks a lot for your help and assistance. NB.

MaestSi commented 1 year ago

Good it works with the other db. Did you use TaxonomyTsv_from_fastaNCBI.R script for creating the taxonomy files? Could please share the .out and .err logs for the Silva db, please? I am curious to understand why it didn’t work. Thanks, SM

nbel15 commented 1 year ago

Hi, Please find below, the .log, .err and .out files for the collapsetable step and .err file of the diversity step along with the general nextflow log. collapseTables.command.err.txt nextflow.log collapseTables.command.txt diversity.command.err.txt collapseTables.command.txt.log

Regarding the taxonomy file, I tried to use the TaxonomyTsv_from_fastaNCBI.R script, and I found some difficulties. First, The function genbank2uid was unable to retrieve the complete set of taxids, so instead of using the accession number extracted from the fasta file I used the gi IDs. Second, the function classification used to retrieve the taxonomy lineage, returns the error bad request (http 400) . It seems that it can not handle the big list of ids or maybe my workstation was unable to sustain the long processing time required. Instead, I saved the taxids list and I used TaxonKit to do the job.

MaestSi commented 1 year ago

Thanks for the detailed information. In the logs you sent me there is this error: cat: /mnt/Ubuntu_HD2/Database/SILVA_Qiime2/SILVA_138_99/taxonomy.tsv: No such file or directory Assuming the file exists, I am wondering if you actually mounted /mnt directory in metontiime2.conf file (line 173) with: containerOptions = '-v /home/:/home -v /mnt:/mnt' However this may be a possible explanation only in case you set the importDb process to false and you manually copied qza files in resultsDir/importDb directory. SM

MaestSi commented 1 year ago

Dear @nbel15 , also thank you for the feedback on the TaxonomyTsv_from_fastaNCBI.R script. I have just made some major edits, so that the script will now process the taxids in chunks, and retry in case of failure. Moreover, I have made it possible to provide an ENTREZ_KEY, which will speed up the data retrieval from NCBI. Let me know if you are able to test it. Best, SM

MaestSi commented 1 year ago

Hi, are there any updates on this issue? Best, SM

nbel15 commented 1 year ago

Hi @MaestSi,

The Path to the database still not working. As you mentioned I will try to change the path, but currently, I am quite satisfied with NCBI database. Regarding the TaxonomyTsv_from_fastaNCBI.R script, it working fine now.

Thanks a lot for your help and assistance.

Best regards, NB

MaestSi commented 1 year ago

Your are welcome. I'm going to close the issue. In case you have any other questions, feel free to reopen it or open a new one. Best, SM