MaestSi / MetONTIIME

A Meta-barcoding pipeline for analysing ONT data in QIIME2 framework
GNU General Public License v3.0
78 stars 17 forks source link

Question #29

Closed KFeye closed 3 years ago

KFeye commented 3 years ago

Hey, how does one accomplish this?

"Then, after completing MetONTIIME installation, set the MINICONDA_DIR variable in config_MinION_mobile_lab.R to the full path to miniconda3 directory."

I see no instructions on how exactly to get that done.

Thanks

MaestSi commented 3 years ago

Hi, you can do it by opening the config_MinION_mobile_lab.R file with a text editor and type: MINICONDA_DIR <- "<path>" where <path> is the value that you should find with the command: which conda | sed 's/miniconda3.*$/miniconda3/' The location of the MINICONDA_DIR variable should have already been suggested by the install.sh script after completing the installation. Let me know if it works! P.s.: the pipeline has just been tested with Ubuntu OS. Simone

KFeye commented 3 years ago

I think my conda is broken, I am about to do a full reboot and I will try this as I really like your pipeline. I'll update you shortly!

On Sat, Mar 13, 2021 at 1:04 AM Simone Maestri @.***> wrote:

Hi, you can do it by opening the config_MinION_mobile_lab.R file with a text editor and type: MINICONDA_DIR <- "" where is the value that you should find with the command: which conda | sed 's/miniconda3.$/miniconda3/' The location of the MINICONDA_DIR variable should have already been suggested by the install.sh* script after completing the installation. Let me know if it works! Simone

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/MaestSi/MetONTIIME/issues/29#issuecomment-797880268, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQXZZRXLHHRR6DF5QUCWU5DTDMFADANCNFSM4ZCW3N6A .

KFeye commented 3 years ago

This is still not working and I really need some help. QIIME will not run within the MetONTIIME_env.

Do I need to reset the metadata or any of the additional information (like what barcoding kit and so on) each time?

Thanks!

On Mon, Mar 15, 2021 at 11:10 AM Kristina Feye @.***> wrote:

I think my conda is broken, I am about to do a full reboot and I will try this as I really like your pipeline. I'll update you shortly!

On Sat, Mar 13, 2021 at 1:04 AM Simone Maestri @.***> wrote:

Hi, you can do it by opening the config_MinION_mobile_lab.R file with a text editor and type: MINICONDA_DIR <- "" where is the value that you should find with the command: which conda | sed 's/miniconda3.$/miniconda3/' The location of the MINICONDA_DIR variable should have already been suggested by the install.sh* script after completing the installation. Let me know if it works! Simone

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/MaestSi/MetONTIIME/issues/29#issuecomment-797880268, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQXZZRXLHHRR6DF5QUCWU5DTDMFADANCNFSM4ZCW3N6A .

MaestSi commented 3 years ago

The first time you run the pipeline you must modify the config file to configure the pipeline. For subsequent runs you can modify it if you want to change some parameters. Please send me your config file and the errors you get, so that I can help you further. P.s.: if you prefer you can send me the files via e-mail. Simone

KFeye commented 3 years ago

Thank you for your help! Honestly I've been trying to crack what is going on for a bit.

When I run QIIME in the MetONTIIME_env in python:

(MetONTIIME_env) Drs-iMac:SILVA dr.kristinafeyewaseen$ source activate MetONTIIME_env

(MetONTIIME_env) Drs-iMac:SILVA dr.kristinafeyewaseen$ ls

SILVA_132_QIIME_release Silva_132_release.zip __MACOSX

(MetONTIIME_env) Drs-iMac:SILVA dr.kristinafeyewaseen$ qiime tools import --type FeatureData[Taxonomy] --input-path SILVA_132_QIIME_release/taxonomy/16S_only/99/taxonomy_all_levels.txt --input-format HeaderlessTSVTaxonomyFormat --output-path taxonomy.qza

-bash: qiime: command not found

(MetONTIIME_env) Drs-iMac:SILVA dr.kristinafeyewaseen$ ls

SILVA_132_QIIME_release Silva_132_release.zip __MACOSX

(MetONTIIME_env) Drs-iMac:SILVA dr.kristinafeyewaseen$ qiime tools import --type FeatureData[Taxonomy] --input-path SILVA_132_QIIME_release/taxonomy/16S_only/99/taxonomy_all_levels.txt --input-format HeaderlessTSVTaxonomyFormat --output-path taxonomy.qza

-bash: qiime: command not found

(MetONTIIME_env) Drs-iMac:SILVA dr.kristinafeyewaseen$

When I run the shell:

(MetONTIIME_env) Drs-iMac:pythonprograms dr.kristinafeyewaseen$ ls

MetONTIIME SILVA

(MetONTIIME_env) Drs-iMac:pythonprograms dr.kristinafeyewaseen$ cd MetONTIIME

(MetONTIIME_env) Drs-iMac:MetONTIIME dr.kristinafeyewaseen$ ls

MetONTIIME

(MetONTIIME_env) Drs-iMac:MetONTIIME dr.kristinafeyewaseen$ cd MetONTIIME

(MetONTIIME_env) Drs-iMac:MetONTIIME dr.kristinafeyewaseen$ ls

Evaluate_diversity.sh MetONTIIME.sh install.sh

Evaluate_diversity_non_phylogenetic.sh MinION_mobile_lab.R nohup.out

Figures README.md subsample_fast5.sh

Import_database.sh Test_BC04_FLO-FLG001_SQK-RAB204.zip version.txt

Launch_MinION_mobile_lab.sh config_MinION_mobile_lab.R

(MetONTIIME_env) Drs-iMac:MetONTIIME dr.kristinafeyewaseen$ chmod 755*

usage: chmod [-fhv] [-R [-H | -L | -P]] [-a | +a | =a [i][# [ n]]] mode|entry file ...

chmod [-fhv] [-R [-H | -L | -P]] [-E | -C | -N | -i | -I] file ...

(MetONTIIME_env) Drs-iMac:MetONTIIME dr.kristinafeyewaseen$ chmod 755* ./install.sh

chmod: Invalid file mode: 755*

(MetONTIIME_env) Drs-iMac:MetONTIIME dr.kristinafeyewaseen$ bash ./install.sh

./install.sh: line 21: realpath: command not found

Warning: 'bioconda' already in 'channels' list, moving to the top

Warning: 'conda-forge' already in 'channels' list, moving to the top

Warning: 'r' already in 'channels' list, moving to the top

Warning: 'anaconda' already in 'channels' list, moving to the top

--2021-03-16 10:40:07-- https://data.qiime2.org/distro/core/qiime2-2020.11-py36-linux-conda.yml

Resolving data.qiime2.org (data.qiime2.org)... 54.200.1.12

Connecting to data.qiime2.org (data.qiime2.org)|54.200.1.12|:443... connected.

HTTP request sent, awaiting response... 302 FOUND

Location: https://raw.githubusercontent.com/qiime2/environment-files/master/2020.11/release/qiime2-2020.11-py36-linux-conda.yml [following]

--2021-03-16 10:40:08-- https://raw.githubusercontent.com/qiime2/environment-files/master/2020.11/release/qiime2-2020.11-py36-linux-conda.yml

Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.111.133, 185.199.109.133, ...

Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.

HTTP request sent, awaiting response... 200 OK

Length: 9474 (9.3K) [text/plain]

Saving to: ‘qiime2-2020.11-py36-linux-conda.yml’

qiime2-2020.11-py36-linux-conda 100%[====================================================>] 9.25K --.-KB/s in 0s

2021-03-16 10:40:08 (42.2 MB/s) - ‘qiime2-2020.11-py36-linux-conda.yml’ saved [9474/9474]

CondaValueError: prefix already exists: /usr/local/Caskroom/miniconda/base/envs/MetONTIIME_env

Collecting package metadata (current_repodata.json): done

Solving environment: done

All requested packages already installed.

Requirement already satisfied: pycoQC in /usr/local/Caskroom/miniconda/base/lib/python3.8/site-packages (2.5.2)

Requirement already satisfied: scipy>=1.5 in /usr/local/Caskroom/miniconda/base/lib/python3.8/site-packages (from pycoQC) (1.6.1)

Requirement already satisfied: plotly==4.1.0 in /usr/local/Caskroom/miniconda/base/lib/python3.8/site-packages (from pycoQC) (4.1.0)

Requirement already satisfied: jinja2>=2.10 in /usr/local/Caskroom/miniconda/base/lib/python3.8/site-packages (from pycoQC) (2.11.3)

Requirement already satisfied: pysam>=0.16 in /usr/local/Caskroom/miniconda/base/lib/python3.8/site-packages (from pycoQC) (0.16.0.1)

Requirement already satisfied: h5py>=3.1 in /usr/local/Caskroom/miniconda/base/lib/python3.8/site-packages (from pycoQC) (3.2.1)

Requirement already satisfied: tqdm>=4.54 in /usr/local/Caskroom/miniconda/base/lib/python3.8/site-packages (from pycoQC) (4.59.0)

Requirement already satisfied: numpy>=1.19 in /usr/local/Caskroom/miniconda/base/lib/python3.8/site-packages (from pycoQC) (1.20.1)

Requirement already satisfied: pandas>=1.1 in /usr/local/Caskroom/miniconda/base/lib/python3.8/site-packages (from pycoQC) (1.2.3)

Requirement already satisfied: retrying>=1.3.3 in /usr/local/Caskroom/miniconda/base/lib/python3.8/site-packages (from plotly==4.1.0->pycoQC) (1.3.3)

Requirement already satisfied: six in /usr/local/Caskroom/miniconda/base/lib/python3.8/site-packages (from plotly==4.1.0->pycoQC) (1.15.0)

Requirement already satisfied: MarkupSafe>=0.23 in /usr/local/Caskroom/miniconda/base/lib/python3.8/site-packages (from jinja2>=2.10->pycoQC) (1.1.1)

Requirement already satisfied: pytz>=2017.3 in /usr/local/Caskroom/miniconda/base/lib/python3.8/site-packages (from pandas>=1.1->pycoQC) (2021.1)

Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/Caskroom/miniconda/base/lib/python3.8/site-packages (from pandas>=1.1->pycoQC) (2.8.1)

Modify variables PIPELINE_DIR and MINICONDA_DIR in config_MinION_mobile_lab.R

PIPELINE_DIR <- ""

MINICONDA_DIR <- "/usr/local/Caskroom/miniconda/base/bin/conda"

(MetONTIIME_env) Drs-iMac:MetONTIIME dr.kristinafeyewaseen$ qiime tools import --type FeatureData[Taxonomy] --input-path SILVA_132_QIIME_release/taxonomy/16S_only/99/taxonomy_all_levels.txt --input-format HeaderlessTSVTaxonomyFormat --output-path taxonomy.qza

-bash: qiime: command not found

(MetONTIIME_env) Drs-iMac:MetONTIIME dr.kristinafeyewaseen$

(MetONTIIME_env) Drs-iMac:MetONTIIME dr.kristinafeyewaseen$ open config_minION_mobile_lab.R

And my markdown changes....

I'm using the 96bc expansion pack.

#

Copyright 2019 Simone Maestri. All rights reserved.

Simone Maestri @.***>

#

This program is free software: you can redistribute it and/or modify

it under the terms of the GNU General Public License as published by

the Free Software Foundation, either version 3 of the License, or

(at your option) any later version.

#

This program is distributed in the hope that it will be useful,

but WITHOUT ANY WARRANTY; without even the implied warranty of

MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the

GNU General Public License for more details.

#

You should have received a copy of the GNU General Public License

along with this program. If not, see http://www.gnu.org/licenses/.

#

####################################################################################################

Note: rows starting with '#' are notes for the user, and are ignored by

the software

if do_subsampling_flag <- 1, subsampling of num_fast5_files fast5 files is

performed; otherwise set do_subsampling_flag <- 0

do_subsampling_flag <- 0

num_fast5_files is the number of fast5 files to be subsampled/analysed (if

do_subsampling_flag <- 1)

num_fast5_files <- 25

BC_int are the barcodes used in the experiment

BC_int <- c("BC01", "BC02", "BC03", "BC04", "BC05", "BC06", "BC07",

"BC08", "BC09", "BC10", "BC11", "BC12")

BC_int <- c("BC01", "BC02", "BC03", "BC04", "BC05", "BC06", "BC07", "BC08", "BC09", "BC10", "BC11", "BC12")

barcode kits

barcode_kits <- c("EXP-NBD103", "EXP-NBD114", "EXP-PBC001", "EXP-PBC096",

"SQK-16S024", "SQK-LWB001", "SQK-RAB201", "SQK-RBK001", "SQK-RBK004", "SQK-RLB001")

barcode_kits <- c("SQK-RAB204")

kit (1D/1D^2 reads/rapid 16S)

kit <- "SQK-RAB204"

flowcell chemistry (R9.4/R9.5 chemistry)

flowcell <- "FLO-MIN106"

fast_basecalling_flag <- 1 if you want to use the fast basecalling

algorithm (FLO-MIN106 only); otherwise set fast_basecalling_flag <- 0 if you want to use the accurate but slow one

fast_basecalling_flag <- 1

pair_strands_flag <- 1 if, in case a 1d2 kit and FLO-MIN107 flow-cell have

been used, you want to perform 1d2 basecalling; otherwise set pair_strands_flag <- 0

pair_strands_flag <- 0

require_two_barcodes_flag <- 1 if you want to keep only reads with a

barcode (tag) at both ends of the read; otherwise set require_two_barcodes_flag <- 0

require_two_barcodes_flag <- 0

save_space_flag <- 1 if you want temporary files to be automatically

deleted; otherwise set save_space_flag <- 0

save_space_flag <- 0

set the maximum number of threads to be used

num_threads <- 30

set a mean amplicon length [bp]: for amplicon length I refer to the length

of the biological sequence after adapters and primers trimming

amplicon_length <- 1400

fixed_lenfil_flag <- 1 if you want to keep reads in the range

[amplicon_length - lenfil_tol/2; amplicon_length + lenfil_tol/2]; otherwise set fixed_lenfil_flag <- 0 if you want to keep reads in the range [mean_length -2sd; mean_length + 2sd] where mean_length and sd are evaluated on a sample basis

fixed_lenfil_flag <- 1

if fixed_lenfil_flag <- 1, lenfil_tol [bp] is the size of the window

centered in amplicon_length for reads to be kept

lenfil_tol <- 300

set primers length [bp]; if the kit you used already contains PCR primers

as part of the adapters, you can set this value to 0

primers_length <- 25

min read quality value

min_qual <- 7

Choose taxonomic classifier between Blast and Vsearch

CLASSIFIER <- "Vsearch"

MAX_ACCEPTS is the maximum number of hits for each query; if a value > 1

is used, a consensus taxonomy for the MAX_ACCEPTS hits is retrieved

MAX_ACCEPTS <- 3

QUERY_COV is the minimum fraction of a query sequence that should be

aligned to a sequence in the database

QUERY_COV <- 0.8

ID_THR is the minimum alignment identity threshold

ID_THR <- 0.85

########################################################################################################

PIPELINE DIR

PIPELINE_DIR <- "/usr/local/Caskroom/miniconda/base/bin:/usr/local/Caskroom/miniconda/base/envs/MetONTIIME_env/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/bin:/sbin:/usr/bin:/usr/sbin:/bin:/sbin:/usr/bin:/usr/sbin"

MINICONDA DIR

MINICONDA_DIR <- "/usr/local/Caskroom/miniconda3"

basecaller_dir

BASECALLER_DIR <- "/path/to/ont-guppy-cpu/bin/"

NCBI-downloaded sequences (QIIME2 artifact)

DB <- "/path/to/PRJNA33175_Bacterial_sequences.qza"

Taxonomy of NCBI-downloaded sequences (QIIME2 artifact)

TAXONOMY <- "/path/to/PRJNA33175_taxonomy.qza"

sample-metadata file describing samples metadata; it is created

automatically if it doesn't exist

SAMPLE_METADATA <- "/path/to/sample-metadata.tsv"

########## End of user editable region #################################################################

load BioStrings package

suppressMessages(library(Biostrings))

path to MetONTIIME.sh

MetONTIIME <- paste0(PIPELINE_DIR, "/MetONTIIME.sh")

path to subsample fast5

subsample_fast5 <- paste0(PIPELINE_DIR, "/subsample_fast5.sh")

SEQTK

SEQTK <- paste0(MINICONDA_DIR, "/envs/MetONTIIME_env/bin/seqtk")

PYCOQC

PYCOQC <- paste0(MINICONDA_DIR, "/envs/MetONTIIME_env/bin/pycoQC")

NANOFILT

NANOFILT <- paste0(MINICONDA_DIR, "/envs/MetONTIIME_env/bin/NanoFilt")

On Tue, Mar 16, 2021 at 1:15 AM Simone Maestri @.***> wrote:

The first time you run the pipeline you must modify the config file to configure the pipeline. For the subsequent runs you can modify it if you want to change some parameters. Please send me your config file and the errors you get, so that I can help you further. Simone

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/MaestSi/MetONTIIME/issues/29#issuecomment-799984245, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQXZZRSOGZL3NSETQLUMYC3TD3ZQVANCNFSM4ZCW3N6A .

MaestSi commented 3 years ago

Hi, for sure you need to set these paths:

#basecaller_dir
BASECALLER_DIR <- "/path/to/ont-guppy-cpu/bin/"
#NCBI-downloaded sequences (QIIME2 artifact)
DB <- "/path/to/PRJNA33175_Bacterial_sequences.qza"
#Taxonomy of NCBI-downloaded sequences (QIIME2 artifact)
TAXONOMY <- "/path/to/PRJNA33175_taxonomy.qza"
#sample-metadata file describing samples metadata; it is created
automatically if it doesn't exist
SAMPLE_METADATA <- "/path/to/sample-metadata.tsv"

Moreover, mind that there is a space between chmod 755 and *. I see you are working on a Mac, is this the case? I never ran this pipeline on Mac. For sure the install.sh script downloads QIIME2 for Linux systems, so this is the main reason why qiime command is not found and the whole pipeline is not working. If you don't have access to a Linux-based system (e.g. Ubuntu), you may try setting up a virtual machine, but I fear that in that case only small datasets will work, since the maximum RAM you can get is usually limited. P.s.: you may try opening the install.sh script and change 'linux' to 'osx', in order to download an appropriate QIIME2 version. I don't know it this will be enough, though. Simone

KFeye commented 3 years ago

Thanks again for your help on this!

Ok, I'll make those changes. Your pipeline indicates it will make a metadata sheet. Is your format the same as the QIIME2 format? Or will it just make one for me that I can import? Or is there another format that you follow so I can ensure everything is cleanly uploaded?

I think I can just download QIIME2 that you use and create the files, which should solve that issue. Will there be more issues as I am in mac? For the install.sh, I just have to run bash commands with it and it tends to work.

I am working with a mac. I don't see any good pipelines for this that are not in Linux. I like VMs but my big worry there is that my dataset is too big...

On Tue, Mar 16, 2021 at 10:51 AM Simone Maestri @.***> wrote:

Hi, for sure you need to set these paths:

basecaller_dir

BASECALLER_DIR <- "/path/to/ont-guppy-cpu/bin/"

NCBI-downloaded sequences (QIIME2 artifact)

DB <- "/path/to/PRJNA33175_Bacterial_sequences.qza"

Taxonomy of NCBI-downloaded sequences (QIIME2 artifact)

TAXONOMY <- "/path/to/PRJNA33175_taxonomy.qza"

sample-metadata file describing samples metadata; it is created

automatically if it doesn't exist SAMPLE_METADATA <- "/path/to/sample-metadata.tsv"

Moreover, mind that there is a space between chmod 755 and . I see you are working on a Mac, is this the case? I never ran this pipeline on Mac. For sure the install.sh* script downloads QIIME2 for Linux systems, so this is the main reason why qiime command is not found and the whole pipeline is not working. If you don't have access to a Linux-based system (e.g. Ubuntu), you may try setting up a virtual machine, but I fear that in that case only small datasets will work, since the maximum RAM you can get is usually limited. Simone

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/MaestSi/MetONTIIME/issues/29#issuecomment-800381612, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQXZZRUMHVPTEHUGNVF6NIDTD55AXANCNFSM4ZCW3N6A .

MaestSi commented 3 years ago

Ok, I'll make those changes. Your pipeline indicates it will make a metadata sheet. Is your format the same as the QIIME2 format? Or will it just make one for me that I can import? Or is there another format that you follow so I can ensure everything is cleanly uploaded?

The pipeline automatically creates a metadata file for importing the samples. You just have to set where it will create the file in the config file (e.g. SAMPLE_METADATA <- "/home/simone/sample-metadata.tsv" will create sample-metadata.tsv file in /home/simone directory with minimal information required for import). If you have additional metadata you want to consider in your subsequent analyses, just create one file and validate it with Keemei.

Will there be more issues as I am in mac?

I have absolutely no clue, never worked with that. Let me know about your experience!

For sure I suggest doing a preliminary test with a small dataset (e.g. the one uploaded in the repository) to see if it works. Simone

KFeye commented 3 years ago

Ok, thanks! I'll do that and let you know how it goes.

=)

On Tue, Mar 16, 2021 at 12:50 PM Simone Maestri @.***> wrote:

Ok, I'll make those changes. Your pipeline indicates it will make a metadata sheet. Is your format the same as the QIIME2 format? Or will it just make one for me that I can import? Or is there another format that you follow so I can ensure everything is cleanly uploaded?

The pipeline automatically creates a metadata file for importing the samples. You just have to set where it will create the file in the config file (e.g. SAMPLE_METADATA <- "/home/simone/sample-metadata.tsv" will create sample-metadata.tsv file in /home/simone directory with minimal information required for import). If you have additional metadata you want to consider in your subsequent analyses, just create one file and validate it with Keemei https://keemei.qiime2.org/.

Will there be more issues as I am in mac?

I have absolutely no clue, never worked with that. Let me know about your experience!

For sure I suggest doing a preliminary test with a small dataset (e.g. the one uploaded in the repository) to see if it works. Simone

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/MaestSi/MetONTIIME/issues/29#issuecomment-800478033, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQXZZRWHCEXGBROGFPHYBXDTD6K5XANCNFSM4ZCW3N6A .

MaestSi commented 3 years ago

Hi, are there any updates on your testing? Simone

KFeye commented 3 years ago

Unfortunately, my system seems to keep crashing. =(

On Thu, Mar 25, 2021 at 6:10 AM Simone Maestri @.***> wrote:

Hi, are there any updates on your testing? Simone

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/MaestSi/MetONTIIME/issues/29#issuecomment-806563894, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQXZZRXLLDCCX64XFPAKCKLTFMKZHANCNFSM4ZCW3N6A .

MaestSi commented 3 years ago

Ok, I am pretty sure this is due to an OS-related issue. If you wish trying to set up a virtual machine for emulating Ubuntu OS I will be happy to troubleshoot. For now, I am going to close the issue. Best, Simone

KFeye commented 3 years ago

It is. Sadly, I set up a VR on Google Cloud and my data is also quite large.

=(

On Thu, Mar 25, 2021 at 9:38 AM Simone Maestri @.***> wrote:

Closed #29 https://github.com/MaestSi/MetONTIIME/issues/29.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/MaestSi/MetONTIIME/issues/29#event-4508240602, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQXZZRRPAIYFP6BXG2L6T4TTFNDFJANCNFSM4ZCW3N6A .