ysuzuki1978 commented 1 year ago

Thank you for all your kind support. Since I asked the question at the end of the year, I have tried and tried, but cannot get it to work. We are doing the analysis step using the outputted FASTQ. I am analysing with the following command, but I keep getting the error that it was killed at line 175. Perhaps vsearch is not working properly. I have tried reducing the number of CPU threads used to 20 (ryzen9 5950x MAX32 threads), but the same error occurs. Is there a solution to this problem?

working folder

(MetONTIIME_env) yakuri@yakuri-desktop:~/nanopore/16S_rRNA_20221018/no_sample/20221018_1928_MN38820_FAV42892_85448306/fast5_analysis/analysis$ ls -la total 6350488 drwxrwxr-x 3 yakuri yakuri 4096 Jan 23 08:26 . drwxrwxr-x 6 yakuri yakuri 4096 Dec 22 06:25 .. -rw-rw-r-- 1 yakuri yakuri 149171632 Dec 22 06:45 BC01.fasta -r--r--r-- 1 yakuri yakuri 112579279 Jan 22 17:07 BC01.fastq.gz -rw-rw-r-- 1 yakuri yakuri 120337677 Dec 22 06:51 BC02.fasta -r--r--r-- 1 yakuri yakuri 86698174 Jan 22 17:07 BC02.fastq.gz -rw-rw-r-- 1 yakuri yakuri 135948153 Dec 22 06:57 BC03.fasta -r--r--r-- 1 yakuri yakuri 102794769 Jan 22 17:07 BC03.fastq.gz -rw-rw-r-- 1 yakuri yakuri 136831041 Dec 22 07:05 BC04.fasta -r--r--r-- 1 yakuri yakuri 104728657 Jan 22 17:07 BC04.fastq.gz -rw-rw-r-- 1 yakuri yakuri 190593076 Dec 22 07:14 BC05.fasta -r--r--r-- 1 yakuri yakuri 144041279 Jan 22 17:07 BC05.fastq.gz -rw-rw-r-- 1 yakuri yakuri 144612251 Dec 22 07:21 BC06.fasta -r--r--r-- 1 yakuri yakuri 109281397 Jan 22 17:07 BC06.fastq.gz -rw-rw-r-- 1 yakuri yakuri 161515303 Dec 22 07:30 BC07.fasta -r--r--r-- 1 yakuri yakuri 123050123 Jan 22 17:07 BC07.fastq.gz -rw-rw-r-- 1 yakuri yakuri 192472983 Dec 22 07:38 BC08.fasta -r--r--r-- 1 yakuri yakuri 145029441 Jan 22 17:07 BC08.fastq.gz -rw-rw-r-- 1 yakuri yakuri 124484543 Dec 22 07:47 BC09.fasta -r--r--r-- 1 yakuri yakuri 93137488 Jan 22 17:07 BC09.fastq.gz -rw-rw-r-- 1 yakuri yakuri 247517588 Dec 22 07:58 BC10.fasta -r--r--r-- 1 yakuri yakuri 185358076 Jan 22 17:07 BC10.fastq.gz -rw-rw-r-- 1 yakuri yakuri 170776463 Dec 22 08:07 BC11.fasta -r--r--r-- 1 yakuri yakuri 130524580 Jan 22 17:07 BC11.fastq.gz -rw-rw-r-- 1 yakuri yakuri 225474039 Dec 22 08:17 BC12.fasta -r--r--r-- 1 yakuri yakuri 169291730 Jan 22 17:07 BC12.fastq.gz

# nohup ~/nanopore/MetONTIIME/MetONTIIME.sh -w ~/nanopore/16S_rRNA_20221018/no_sample/20221018_1928_MN38820_FAV42892_85448306/fast5_analysis/analysis -f ~/nanopore/16S_rRNA_20221018/no_sample/20221018_1928_MN38820_FAV42892_85448306/fast5_analysis/analysis/metadata.txt -s ~/nanopore/MetONTIIME/silva_132_99_16S_sequence.qza -t ~/nanopore/MetONTIIME/silva_132_99_16S_taxonomy.qza -n 30 -c Vsearch -m 3 -q 0.8 -i 0.85 &

nohup.out

Imported /home/yakuri/nanopore/16S_rRNA_20221018/no_sample/20221018_1928_MN38820_FAV42892_85448306/fast5_analysis/analysis/manifest.txt as SingleEndFastqManifestPhred33V2 to sequences.qza Saved FeatureTable[Frequency] to: table_tmp.qza Saved FeatureData[Sequence] to: rep-seqs_tmp.qza Saved FeatureTable[Frequency] to: table.qza Saved FeatureData[Sequence] to: rep-seqs.qza Saved Visualization to: demux_summary.qzv Saved Visualization to: table.qzv Saved Visualization to: rep-seqs.qzv /home/yakuri/nanopore/MetONTIIME/MetONTIIME.sh: line 175: 26623 Killed qiime feature-classifier classify-consensus-vsearch --i-query rep-seqs.qza --i-reference-reads $DB --i-reference-taxonomy $TAXONOMY --p-perc-identity $ID_THR --p-query-cov $QUERY_COV --p-maxaccepts 100 --p-maxrejects 100 --p-maxhits $MAX_ACCEPTS --p-strand 'both' --p-unassignable-label 'Unassigned' --p-threads $THREADS --o-classification taxonomy.qza --o-search-results search_results.qza There was an issue with loading the file taxonomy.qza as metadata:

Metadata file path doesn't exist, or the path points to something other than a file. Please check that the path exists, has read permissions, and points to a regular file (not a directory): taxonomy.qza

MaestSi commented 1 year ago

What is the maximum RAM memory available for your computer? Here are some possible solutions (from best to worst, in my opinion):

Require a higher amount of RAM memory (for example, if you are using a job scheduler, or a virtual machine)
Further reduce the number of threads (you may try with 4)
Downsample the reads, so that each sample has up to x (e.g. x = 10,000) reads. You may use the following command:
```
FASTQ_DEMULTIPLEXED_DIR="/path/to/dir"
FASTQ_DEMULTIPLEXED_SUBSAMPLED_DIR="/path/to/dir"
SAMPLING_DEPTH=10000
```

source activate MetONTIIME_env for f in $(find $FASTQ_DEMULTIPLEXED_DIR -name "*.fastq.gz"); do sn=$(echo $(basename $f)); seqtk sample $f $SAMPLING_DEPTH | gzip > $FASTQ_DEMULTIPLEXED_SUBSAMPLED_DIR/$sn; done


- Work with a smaller database (e.g. Silva clustered at 90% sequence identity, but I'm not sure this will work)

Best,
Simone

ysuzuki1978 commented 1 year ago

Thank you for your useful comments.

Memory.

The machine has 128 GB of memory. I think it is using anaconda virtual environment, should I specify the memory allocation value when I create the virtual environment? Or should I specify the memory usage when starting nohup?

CPU thread

I think the number of threads, 4, is quite small, but won't it take too long to analyze?

leadcount.

Maybe the lead counts are too high as you say. is it enough lead counts to analyze with data sampled from 10000 reads by seqtk? What is your general opinion?

MaestSi commented 1 year ago

Mi general opinion would be that 30k reads per sample are more than enough. Going above 100k is probably useless. So, depending on the amount of reads you are analysing now, you may choose a number comprised between 30k and 100k. SM

ysuzuki1978 commented 1 year ago

Dear Dr Simone

I will change SAMPLING_DEPTH to 30000 or 50000 and sample and analyze. Please let me know how much analysis time you have had to analyze 12 samples with 50000 reads in 4 threads.

Best regards, Suzuki

MaestSi commented 1 year ago

I would try with 50k reads per sample with 8 threads, and I would expect the process to take a couple of days. SM

ysuzuki1978 commented 1 year ago

I would like to confirm that "50k reads per sample with 8 threads" means the time required to analyze 12 samples with 50k reads per sample. If it takes a few days per sample, that's a lot of time, and it might be realistic to use about 100 threads on an HPC to do the analysis.

Suzuki

MaestSi commented 1 year ago

Yes, I meant a couple of days to analyse all the samples, with 50k reads each, should be enough, if using 8 threads. SM

ysuzuki1978 commented 1 year ago

I followed your advice and started the analysis with 12 samples of 50k reads and 8 threads. I will wait and hope to see the results in a few days.

ysuzuki1978 commented 1 year ago

I followed your advice and sampled to 50k and started the calculation with 8 threads, but no matter how many times I try, the vsearch process gets killed per 24 hours. I started vsearch by launching qiime directly without the MetONTIIME script, and it continues to work, but after 72 hours, the calculation is still not finished. Is there anything I can do to keep the MetONTIIME script running continuously without being killed?

nohup.out

Saved FeatureTable[Frequency] to: table_tmp.qza Saved FeatureData[Sequence] to: rep-seqs_tmp.qza Saved FeatureTable[Frequency] to: table.qza Saved FeatureData[Sequence] to: rep-seqs.qza Saved Visualization to: demux_summary.qzv Saved Visualization to: table.qzv Saved Visualization to: rep-seqs.qzv /home/yakuri/nanopore/MetONTIIME/MetONTIIME.sh: line 175: 16702 Killed qiime feature-classifier classify-consensus-vsearch --i-query rep-seqs.qza --i-reference-reads $DB --i-reference-taxonomy $TAXONOMY --p-perc-identity $ID_THR --p-query-cov $QUERY_COV --p-maxaccepts 100 --p-maxrejects 100 --p-maxhits $MAX_ACCEPTS --p-strand 'both' --p-unassignable-label 'Unassigned' --p-threads $THREADS --o-classification taxonomy.qza --o-search-results search_results.qza

MaestSi commented 1 year ago

I would just check:

the pipeline works with the test dataset available in the repository
the pipeline works with 1k reads per sample

These should be very quick tests and confirm the issue is due to RAM memory. SM

ysuzuki1978 commented 1 year ago

I sampled my prepared fastq to 1k and started the analysis, but still, when I proceed with MetONTIIME.sh, it stops in the middle of vsearch. qiime is started directly and tries. I will also try to download the latest sliva data again and try again.

I will also try the data uploaded to the repository, referring to the paper using MetONTIIME.

ysuzuki1978 commented 1 year ago

I tried many things, but when I ran with two samples, BC01 and BC02, set at 50k, the process was completed promptly. I will increase the number of samples to be processed at the same time little by little and see how far it can endure. The final analysis will be about 50 samples. Is it OK if I combine the output files and do the final analysis in qiime? Please let me know.

MaestSi commented 1 year ago

Yes, for sure it is ok if you merge the tables afterwards. Feel free to ask me if you find any issues with that step. Ciao, Simone

ysuzuki1978 commented 1 year ago

I have tried and tried, but with 100,000 reads, I could only analyze one sample; with two or more samples, the analysis stops through the same way. We take the method of analyzing and merging one sample at a time. I would like to analyze 12 samples at once for future use, is there any way to improve this? I know the specs are sufficient with ryzen9 5950 MEM 128GB, but is the program designed to analyze on HPC?

Suzuki

MaestSi commented 1 year ago

The only way I see for improving this is either downsampling the reads, working with a more clustered version of Silva db or working on an HPC infrastructure, sorry. Best, SM

ysuzuki1978 commented 1 year ago

I am using silva-138-99 DB, but please let me know if there is another appropriate DB.

Suzuki

MaestSi commented 1 year ago

You could try with BioProject 33175, which is a smaller bacterial db from NCBI for bacterial 16S gene. Please refer to the README in this repo for downloading and building the indexed db. A couple of years ago it was much smaller than Silva, I hope this is still the case. Probably, using a smaller db should reduce RAM memory usage, but I don't know if this will be enough. SM

ysuzuki1978 commented 1 year ago

We will continue to examine which is the best DB. Let me ask one more question.

This pipeline uses "nouhp" to run MetONTIIM.sh. Is there any chance that running the MetONTIIM.sh script directly in a virtual console such as byobu would result in better memory

suzuki

MaestSi commented 1 year ago

Hi, I'm not expert on this topic, but I don't think so. As far as I know, nohup is just a command for running the pipeline in background and redirecting stderr/stdout to nohup.out file, I don't think it has an impact on memory. By the way, you are not running MetONTIIME inside a Virtual Machine, right? SM

ysuzuki1978 commented 1 year ago

You say nohup does not affect the memory environment. Understood. Of course, I am not running it in a VM. At any rate, I'm getting close to being able to analyze it. Thank you very much for your detailed instruction.

MaestSi / MetONTIIME

Vsearch might not be working properly. #57

working folder

nohup.out

Memory.

CPU thread

leadcount.

nohup.out