MaestSi / MetONTIIME

A Meta-barcoding pipeline for analysing ONT data in QIIME2 framework
GNU General Public License v3.0
78 stars 17 forks source link

No such variable: sequences_db #65

Closed joyleng closed 1 year ago

joyleng commented 1 year ago

Hi

I am wanting to analyse sequencing files generated from the MinION after sequencing the V1 and V2 regions of the 16S but I am struggling with trying to set up the reference database to be used. I have uploaded the database files silva-138-99-seqs.qza and silva-138-99-tax.qza and put them into a folder called importDb within my resultsDIr. I also set importDb = false within the metontiime2.conf file as I have uploaded the .qza not the .fasta version of the SILVA database but I am getting the error No such variable: sequences_db.

The sequences were base called before upload and have been uploaded in folders according to barcode so I also changed concatenateFastq = true

The line of code I used to try and run metontiime2 is: nextflow -c metontiime2.conf run metontiime2.nf --workDir="/minion_run1/fastq_files" --resultsDir="/minion_run1/metontiime_output" -profile singularity

I have tried to attach the metontiime2.conf in text file version below in case there is something missing in there. metontiime2.txt

Any help would be much appreciated and I am sure it will be something obvious that I have missed out within the ,conf file as I cannot find this issue anywhere else online!

Many thanks Joy

MaestSi commented 1 year ago

Hi, I think the issue is that you forgot to mount (i.e. to make accessible to the Singularity container) the directory where your data are. To fix this, you should edit line 74 of metontiime2.conf file from: containerOptions = '--bind /home/:/home' to containerOptions = '--bind /home/:/home --bind /minion_run1:/minion_run1' Let me know if this fixes the issue. Best, SM

joyleng commented 1 year ago

Hi

Thanks for getting back to me so quickly.

That makes sense - I have recently started a new job and they don't allow use of docker on their high powered computer cluster so I haven't used singularity before.

I have changed the above but I am still getting the same error.

Do I also need to update singularity.cacheDir = "/path/to/singularity/cacheDir" // which is the before the line I have just changed?

Joy

MaestSi commented 1 year ago

Another issue I noticed in the config file is that for dbSequencesQza and dbTaxonomyQza parameters you specified the full path to the file, while only the basename of the file should be specified. No, you don't need to update singularity.cacheDir. SM

joyleng commented 1 year ago

Ok, I have changed those lines to just state the names of those two .qza files containing the SILVA database but I am still getting the same error.

I tried moving the folder I have specified as resultsDir in the .conf file into the MetONTIIME folder I am currently trying to run this out of. I have changed the resultsDir to match this. I thought as the file specified that the database files need to be in the folder resultsDir/importDb so the database .qza files are now within the folder "/minion_run1_metontiime_output/importDb/". I am still getting the same error unfortunately.

MaestSi commented 1 year ago

You should look in the work directory corresponding to the process importDb. If you did not specify a different path, the work dir should be inside the resultsDir, and then you should be able to find the subfolder corresponding to the process importDb. If you can't find the specific directory from the log files, just go to work dir and run something like: for f in $(find . | grep "\\.command\\.sh"); do cat $f | grep "importDb" && echo $f; done At this point, the directory corresponding to the process should be printed to screen, and you should be able to run:

cd /path/to/importDb/work/dir/
cat .command.sh 
cat .command.out
cat .command.err

In this way, we can better understand why the process is not working. SM

joyleng commented 1 year ago

Ok so I have tried to re-arrange my file structure to save any confusion. So now within my MetONTIIME folder (which contains metontiime2.conf and metontiime.nf) I have made a results dir called "minion_run1_metontiime_output" and I have added specified this in the .conf. Within this is my work dir (containing the folders of fastq files) and importdb folders (containing the .qza database files).

I have attached the nextflow log file when I then tried to run it again as I don't know if that is of any use? .nextflow.log I have importDb = false in the conf file as I thought I didn't have to run this as the database files are already in .qza form. I have attached the .conf in text form below as I have made some changed since the original one I sent in my first message metontiime2.txt

Thanks again Joy

MaestSi commented 1 year ago

I think the problem is that you are also trying to access /pub63/joyl folder, so I think you should mount that folder as well. containerOptions = '--bind /home/:/home --bind /minion_run1:/minion_run1 --bind /pub63/joyl:/pub63/joyl' SM

joyleng commented 1 year ago

Thought I had it cracked for a second there as I got a different error but I have just missed a character when adding the above. Still getting the same "No such variable: sequences_db" error unfortunately.

Have attached the latest nextflow.log file in case that helps. .nextflow.log

MaestSi commented 1 year ago

First, I can see that in the command line: nextflow -c metontiime2.conf run metontiime2.nf --workDir=/minion_run1_metontiime_output/fastq_files--resultsDir=/minion_run1_metontiime_output -profile singularity a space is missing between fastq_files and --resultsDir. Second, I doubt /minion_run1_metontiime_output is an absolute path, and this one is not even a relative path, as the '.' at the beginning is not there. All paths should be absolute paths. Then, also --bind /minion_run1:/minion_run1 is not a proper binding option, if that is not an absolute path too. SM

joyleng commented 1 year ago

Hi

Thankyou for your patience with helping me with this setup. I have limited experience using coding based programmes and usually use mostly QIIME2 and R.

I have corrected the missing space that I had not included previously. In terms of the path's mentioned above, am I right in thinking the full path is needed, therefore the command would look like the below: nextflow -c metontiime2.conf run metontiime2.nf --workDir="/pub63/joyl/MetONTIIME/minion_run1_metontiime_output/ fastq_files" --resultsDir="/pub63/joyl/MetONTIIME/minion_run1_metontiime_output" -profile singularity

The line --bind /minion_run1:/minion_run1 in the .conf file was the previous name of the results directory and left from when I added this during the initial changes to the .conf file. I have not used --bind before. Does it need to be the full path of the results Dir?

Thanks again Joy

MaestSi commented 1 year ago

Hi, beware that in workDir="/pub63/joyl/MetONTIIME/minion_run1_metontiime_output/ fastq_files" there is an additional space before fastq_files, that should be removed. With the --bind option you have to make accessible to the Docker container any folders where it should be able to read from or write to. So, for sure containerOptions = '--bind /pub63/joyl:/pub63/joyl is needed.

The line --bind /minion_run1:/minion_run1 in the .conf file was the previous name of the results directory and left from when I added this during the initial changes to the .conf file.

That can't be correct, since /minion_run1 is not an absolute path. SM

joyleng commented 1 year ago

Hi

I have kept line 76 in the .conf as: containerOptions = '--bind /home/:/home --bind /pub63/joyl:/pub63/joyl' and tried running it again (without the space).

I have attached a text version of the. conf file as it stands: metontiime2.txt

and the nextflow.log file from this latest run: .nextflow.log

Thanks again Joy

MaestSi commented 1 year ago

workDir, sampleMetadata and resultsDir should have the full path name in the conf file. This is not an issue with workDir and resultsDir, as you also specified a value for them in the command line, and this is going to overwrite the value in the conf file. But sampleMetadata should have the full path, for example: sampleMetadata=/pub63/joyl/MetONTIIME/minion_run1_metontiime_output/sample-metadata.tsv Moreover, it seems you are trying to use pbspro executor (line 76). If you do not have that job scheduler you should switch to local, with: executor = 'local' SM

joyleng commented 1 year ago

Ah, Ok I didn't realise that the full path for the metadata file was needed. I have not uploaded a metadata.tsv as I wanted to see what the format was in the one produced during the analysis.

The executor that was previously specified was the default that came with the original file so I have changed that to local.

I have run it again with no luck. Here's the .conf and the log files: metontiime2.txt .nextflow.log

Thanks again Joy

MaestSi commented 1 year ago

I read some issues on GitHub, like this one. It seems that with the Nextflow version you are using (21.10.6), DSL2 is not the default syntax. So, following here, you should either add nextflow.enable.dsl=2 at the beginning of the script, or (my advice) update Nextflow. SM

joyleng commented 1 year ago

I have updated nextflow and that seems to have done the trick! It's showing the processes and showing when they are done. I will leave it for a bit so that it has time to go through all of the processes.

Thanks again for all your help Joy

MaestSi commented 1 year ago

Great! Add -bg to the command line if you want to run it in background. Let me know if it runs to the end! SM

MaestSi commented 1 year ago

Hi Joy, I hope the analyses completed successfully. I am going to close the issue, in case you have further questions, feel free to reopen it. Best, SM