MaestSi / MetONTIIME

A Meta-barcoding pipeline for analysing ONT data in QIIME2 framework
GNU General Public License v3.0
78 stars 17 forks source link

Conda 4.9.2 #27

Closed Ksherriff closed 3 years ago

Ksherriff commented 3 years ago

Hi,

I have been attempting to use this pipeline with conda 4.9.2. I was hoping you had a work around for some issues I have been coming across. For the newer version of conda to initialize the environment it is :

conda activate MetONTIIME_env

instead of:

source activate MetONTIIME_env

This initially wasn't a problem as I was able to download and set up all the programs and enter the environment however it does prevent the running of the Launch_MinION_mobile_lab.sh file. If run as is I get an error of:

./Launch_MinION_mobile_lab.sh: line 23: activate: No such file or directory

If I change the script to be conda activate the error then becomes:

CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'

On a different computer that was recently reformatted we downloaded an older version of miniconda3 that was closer to the version you used and we are currently running the script and it appears to be working so we have one of our computers set up for this but is there any work around for a newer version of conda?

Thanks!

MaestSi commented 3 years ago

Hi, both conda activate and source activate should work. I kept the source activate because it looks like conda activate requires initialization with conda init, and that may not have been already performed on some machines. Before you run the pipeline, you are not asked to activate any environments, as the script ./Launch_MinION_mobile_lab.sh will try to activate the environment itself. However, in my experience, also after the activate: No such file or directory error, the pipeline would continue running without any problems. Isn't this your experience? Simone

Ksherriff commented 3 years ago

After those initial errors come up the script does continue on and this is what would be running in terminal:

nohup: appending output to 'nohup.out'

I just tried that again and this time I am allowing it to run. I have htop open and guppy basecaller is currently running however it is not using many threads which is odd. For the other computer we are running this on it is currently running with all threads maxed out but the nohup seems to be done and nothing is actively running in that terminal. Is this what we should be seeing while it is running? Also another question the other computer has been running overnight at this point. We are using the silva database and the library is approximately 178 GB of fast5 files.

MaestSi commented 3 years ago

I just tried that again and this time I am allowing it to run. I have htop open and guppy basecaller is currently running however it is not using many threads which is odd.

That may depend on what Guppy is doing, you can only indicate the maximum number of threads.

the nohup seems to be done and nothing is actively running in that terminal. Is this what we should be seeing while it is running?

When the run is over you should not see any process running in htop and at the end of the nohup.out file you should find a message telling yoy that the analysis is over.

We are using the silva database and the library is approximately 178 GB of fast5 files.

That is a huge amount of data, I guess it will take a lot of time! By the way, now I am using conda v4.9.2 too. Simone

Ksherriff commented 3 years ago

We just checked the nohup files for both computers as well as found the analysis folder that was created. The computer that is still working away has been creating outputs so it looks like it is running properly. For this computer that has the updated conda this is the last lines that are in the nohup file

Error in h(simpleError(msg, call)) : error in evaluating the argument 'x' in selecting a method for function 'grep': cannot open the connection Calls: grep -> readLines -> file -> .handleSimpleError -> h In addition: Warning messages: 1: In file(con, "r") : 'raw = FALSE' but '/var/lib/MinKNOW/data/MCB_WCB_Purity_LibA_10Feb21/no_sample/20210210_2107_MN24971_FAO99362_80050b71/fast5_pass_analysis/analysis/' is not a regular file 2: In file(con, "r") : cannot open file '/var/lib/MinKNOW/data/MCB_WCB_Purity_LibA_10Feb21/no_sample/20210210_2107_MN24971_FAO99362_80050b71/fast5_pass_analysis/analysis/': it is a directory Execution halted

MaestSi commented 3 years ago

Could you report which files were created before the run stopped? Are there fastq.gz files in the analysis subfolder? Simone

Ksherriff commented 3 years ago

Folders within the fast5_analysis folder are: Analysis - logfile.txt Basecalling - empty Preprocessing - barcoding_summary.txt and read_processor_log-2021-02-11_17-13-35.log

MaestSi commented 3 years ago

It looks like you may have an issue with Guppy. Which version are you running and on which system? Do you also have

[guppy/error] main: getrandom
[guppy/warning] main: An error has occurred. Aborting.

error, in the nohup.out file?

Ksherriff commented 3 years ago

I don't have those errors but I am using an older version of guppy. Currently using 2.1.3. Going to update that and then try this again.

MaestSi commented 3 years ago

Ok, just wanted to inform you that I faced that error on Ubuntu 14 with Guppy v4.4.1, and on that system the most recent version I can get to work is Guppy v4.2..2 Simone

Ksherriff commented 3 years ago

Updated guppy to 4.4.2 and it appears to be working. Just started it now but the computer is running a lot more threads this time. This is a smaller library size then what is running on the other computer so we shall see what happens with this one.

MaestSi commented 3 years ago

Ok, fingers crossed! Simone

Ksherriff commented 3 years ago

So it runs through Guppy successfully so that takes care of that issue. However at the end of everything all of my reads have been filtered out. These reads have been previously run through Epi2me and that vast majority of reads were at 1500 and the average quality score was 9.5 so these should be making it through the filtering steps. Any thoughts on that?

For our other computer it is still running through vsearch. It has been on that step since Thursday or Friday last week. It is a very large library but that does seem abnormally long. The last line of text on the nohup file is this:

Imported /var/lib/minknow/data/MCB_WCB_Purity_LibB_10Feb21/no_sample/20210210_2148_MN29892_FAO99401_63231a43/fast5_pass_analysis/analysis/manifest.txt as SingleEndFastqManifestPhred33V2 to sequences.qza Saved FeatureTable[Frequency] to: table_tmp.qza Saved FeatureData[Sequence] to: rep-seqs_tmp.qza

For files created we only have the temp files within the analysis folder. Wondering if it is hung up on a step or if it is going to take this long for libraries of this size. The average size of fastq.gz file for each barcode is about 600 MB.

MaestSi commented 3 years ago

Did you set the barcodes correctly in the config file? Are all reads in preprocessing/unclassified folder, or were they demultiplexed? If reads are of good quality and length they should not be filtered out. Regarding the other computer, I think it is not stuck, such a great amount of data is going to take a lot of time. I would advise subsampling a small set of reads if you need results quickly. Simone

Ksherriff commented 3 years ago

Yup. It is currently set to use the RAB204 kit which is the one we have been using. All the fastq files have been moved into the pre-processing folder and into their correct barcoded folders. This is probably just a sample problem on our end. I am going to re-run these fastq files through Epi2me and see what the read out is. Maybe try this with a different sample. You can close this issue if you want. You have been a huge help! Thank you so much.

MaestSi commented 3 years ago

You're welcome! Simone