Good morning! I need your help!
From bibi's database "leBIBI IV SSU-rDNA (16S) Automated ProKaryotes Phylogeny," I've tried to generate the necessary data for NanoCLUST to be able to use them when performing the analysis. I've used programs like BLAST+ 2.13.0 (makeblastdb) to try to obtain the following extensions: .ndb, .nhr, .nin, .nnd, .nni, .nog, .nos, .not, .nsq, .ntf, .nto, but there are always two extensions that don't appear: .nnd and .nni.
When I run the program, I get the following error:
(Nextflow) cnr-strep@cnrstrep-Precision-3660:~/NanoCLUST$ nextflow run main.nf -profile docker --reads '/media/cnr-strep/ACC22AE1C22AB00E/FastqHAC-Lactobacillus/FastQ_Bichat/Fastq-HAC-16052022/barcode17/trimming/barcode17.filtered.fastq' --db 'db/16S_ribosomal_RNA' --tax 'db/taxdb/'
N E X T F L O W ~ version 22.10.6
Launching main.nf [determined_ampere] DSL1 - revision: 2a51687d92
Run Name : determined_ampere
Reads : /media/cnr-strep/ACC22AE1C22AB00E/FastqHAC-Lactobacillus/FastQ_Bichat/Fastq-HAC-16052022/barcode17/trimming/barcode17.filtered.fastq
Max Resources : 128 GB memory, 16 cpus, 10d time per job
Container : docker - [:]
Output dir : ./results
Launch dir : /home/cnr-strep/NanoCLUST
Working dir : /home/cnr-strep/NanoCLUST/work
Script dir : /home/cnr-strep/NanoCLUST
User : cnr-strep
Config Profile : docker
executor > local (23)
[8b/15691c] process > QC (1) [100%] 1 of 1 ✔
[5e/36346c] process > fastqc (1) [100%] 1 of 1 ✔
[3c/3e9715] process > kmer_freqs (1) [100%] 1 of 1 ✔
[26/7c4085] process > read_clustering (1) [100%] 1 of 1 ✔
[3c/d03ee2] process > split_by_cluster (1) [100%] 1 of 1 ✔
[96/2c6b76] process > read_correction (3) [100%] 3 of 3 ✔
[bb/f3035a] process > draft_selection (3) [100%] 3 of 3 ✔
[21/1e894c] process > racon_pass (3) [100%] 3 of 3 ✔
[bf/28314d] process > medaka_pass (3) [100%] 3 of 3 ✔
[90/ad3c87] process > consensus_classification (3) [100%] 3 of 3 ✔
[07/d23aa0] process > join_results (1) [100%] 1 of 1 ✔
[4f/f929af] process > get_abundances (1) [ 0%] 0 of 1
[- ] process > plot_abundances -
[fe/e2e60d] process > output_documentation [100%] 1 of 1 ✔
Error executing process > 'get_abundances (1)'
Caused by:
Process get_abundances (1) terminated with an error exit status (1)
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import rc
import pandas as pd
from functools import reduce
import requests
import json
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import rc
import pandas as pd
from functools import reduce
import requests
import json
Avoids pipeline crash due to "nan" classification output. Thanks to Qi-Maria from Github
if str(tax_id) == "nan":
tax_id = 1
path = 'http://api.unipept.ugent.be/api/v1/taxonomy.json?input[]=' + str(int(tax_id)) + '&extra=true&names=true'
complete_tax = requests.get(path).text
#Checks for API correct response (field containing the tax name). Thanks to devinbrown from Github
try:
name = json.loads(complete_tax)[0][tax_level_tag]
except:
name = str(int(tax_id))
return json.loads(complete_tax)[0][tax_level_tag]
def get_abundance_values(names,paths):
dfs = []
for name,path in zip(names,paths):
data = pd.read_csv(path, index_col=False, sep=';').iloc[:,1:]
total = sum(data['reads_in_cluster'])
rel_abundance=[]
for index,row in data.iterrows():
rel_abundance.append(row['reads_in_cluster'] / total)
data['rel_abundance'] = rel_abundance
dfs.append(pd.DataFrame({'taxid': data['taxid'], 'rel_abundance': rel_abundance}))
data.to_csv("" + name + "_nanoclust_out.txt")
executor > local (23)
[8b/15691c] process > QC (1) [100%] 1 of 1 ✔
[5e/36346c] process > fastqc (1) [100%] 1 of 1 ✔
[3c/3e9715] process > kmer_freqs (1) [100%] 1 of 1 ✔
[26/7c4085] process > read_clustering (1) [100%] 1 of 1 ✔
[3c/d03ee2] process > split_by_cluster (1) [100%] 1 of 1 ✔
[96/2c6b76] process > read_correction (3) [100%] 3 of 3 ✔
[bb/f3035a] process > draft_selection (3) [100%] 3 of 3 ✔
[21/1e894c] process > racon_pass (3) [100%] 3 of 3 ✔
[bf/28314d] process > medaka_pass (3) [100%] 3 of 3 ✔
[90/ad3c87] process > consensus_classification (3) [100%] 3 of 3 ✔
[07/d23aa0] process > join_results (1) [100%] 1 of 1 ✔
[4f/f929af] process > get_abundances (1) [100%] 1 of 1, failed: 1 ✘
[- ] process > plot_abundances -
[fe/e2e60d] process > output_documentation [100%] 1 of 1 ✔
Execution cancelled -- Finishing pending tasks before exit
[nf-core/nanoclust] Pipeline completed with errors
WARN: Graphviz is required to render the execution DAG in the given format -- See http://www.graphviz.org for more info.
Error executing process > 'get_abundances (1)'
Caused by:
Process get_abundances (1) terminated with an error exit status (1)
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import rc
import pandas as pd
from functools import reduce
import requests
import json
Command error:
Traceback (most recent call last):
File ".command.sh", line 65, in
get_abundance(names,paths, "G")
File ".command.sh", line 59, in get_abundance
df_final_grp = merge_abundance(dfs, tax_level)
File ".command.sh", line 49, in merge_abundance
df_final["taxid"] = [get_taxname(row["taxid"], tax_level) for index, row in df_final.iterrows()]
File ".command.sh", line 49, in
df_final["taxid"] = [get_taxname(row["taxid"], tax_level) for index, row in df_final.iterrows()]
File ".command.sh", line 28, in get_taxname
return json.loads(complete_tax)[0][tax_level_tag]
IndexError: list index out of range
Work dir:
/home/cnr-strep/NanoCLUST/work/4f/f929af73009d063bc5793e38804f62
Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out
(Nextflow) cnr-strep@cnrstrep-Precision-3660:~/NanoCLUST$
Please, could you guide me on how to generate a database that can be interpreted by NanoCLUST from a FASTA file containing a list of selected 16S sequences?
Good morning! I need your help! From bibi's database "leBIBI IV SSU-rDNA (16S) Automated ProKaryotes Phylogeny," I've tried to generate the necessary data for NanoCLUST to be able to use them when performing the analysis. I've used programs like BLAST+ 2.13.0 (makeblastdb) to try to obtain the following extensions: .ndb, .nhr, .nin, .nnd, .nni, .nog, .nos, .not, .nsq, .ntf, .nto, but there are always two extensions that don't appear: .nnd and .nni. When I run the program, I get the following error:
(Nextflow) cnr-strep@cnrstrep-Precision-3660:~/NanoCLUST$ nextflow run main.nf -profile docker --reads '/media/cnr-strep/ACC22AE1C22AB00E/FastqHAC-Lactobacillus/FastQ_Bichat/Fastq-HAC-16052022/barcode17/trimming/barcode17.filtered.fastq' --db 'db/16S_ribosomal_RNA' --tax 'db/taxdb/' N E X T F L O W ~ version 22.10.6 Launching
main.nf
[determined_ampere] DSL1 - revision: 2a51687d92/ /| / // / / / / // / / // // /_/ // // /
// |/_,// /_/_/ ____//__//__//_/
NanoCLUST v1.0dev
Run Name : determined_ampere Reads : /media/cnr-strep/ACC22AE1C22AB00E/FastqHAC-Lactobacillus/FastQ_Bichat/Fastq-HAC-16052022/barcode17/trimming/barcode17.filtered.fastq Max Resources : 128 GB memory, 16 cpus, 10d time per job Container : docker - [:] Output dir : ./results Launch dir : /home/cnr-strep/NanoCLUST Working dir : /home/cnr-strep/NanoCLUST/work Script dir : /home/cnr-strep/NanoCLUST User : cnr-strep Config Profile : docker
executor > local (23) [8b/15691c] process > QC (1) [100%] 1 of 1 ✔ [5e/36346c] process > fastqc (1) [100%] 1 of 1 ✔ [3c/3e9715] process > kmer_freqs (1) [100%] 1 of 1 ✔ [26/7c4085] process > read_clustering (1) [100%] 1 of 1 ✔ [3c/d03ee2] process > split_by_cluster (1) [100%] 1 of 1 ✔ [96/2c6b76] process > read_correction (3) [100%] 3 of 3 ✔ [bb/f3035a] process > draft_selection (3) [100%] 3 of 3 ✔ [21/1e894c] process > racon_pass (3) [100%] 3 of 3 ✔ [bf/28314d] process > medaka_pass (3) [100%] 3 of 3 ✔ [90/ad3c87] process > consensus_classification (3) [100%] 3 of 3 ✔ [07/d23aa0] process > join_results (1) [100%] 1 of 1 ✔ [4f/f929af] process > get_abundances (1) [ 0%] 0 of 1 [- ] process > plot_abundances - [fe/e2e60d] process > output_documentation [100%] 1 of 1 ✔ Error executing process > 'get_abundances (1)'
Caused by: Process
get_abundances (1)
terminated with an error exit status (1)Command executed [/home/cnr-strep/NanoCLUST/templates/get_abundance.py]:
!/usr/bin/env python
import numpy as np import matplotlib.pyplot as plt from matplotlib import rc import pandas as pd from functools import reduce import requests import json
https://unipept.ugent.be/apidocs/taxonomy
def get_taxname(tax_id,tax_level): tags = {"S": "species_name","G": "genus_name","F": "family_name","O":'order_name', "C": "class_name"} tax_level_tag = tags[tax_level]
Avoids pipeline crash due to "nan" classification output. Thanks to Qi-Maria from Github
def get_abundance_values(names,paths): dfs = [] for name,path in zip(names,paths): data = pd.read_csv(path, index_col=False, sep=';').iloc[:,1:]
executor > local (23) [8b/15691c] process > QC (1) [100%] 1 of 1 ✔ [5e/36346c] process > fastqc (1) [100%] 1 of 1 ✔ [3c/3e9715] process > kmer_freqs (1) [100%] 1 of 1 ✔ [26/7c4085] process > read_clustering (1) [100%] 1 of 1 ✔ [3c/d03ee2] process > split_by_cluster (1) [100%] 1 of 1 ✔ [96/2c6b76] process > read_correction (3) [100%] 3 of 3 ✔ [bb/f3035a] process > draft_selection (3) [100%] 3 of 3 ✔ [21/1e894c] process > racon_pass (3) [100%] 3 of 3 ✔ [bf/28314d] process > medaka_pass (3) [100%] 3 of 3 ✔ [90/ad3c87] process > consensus_classification (3) [100%] 3 of 3 ✔ [07/d23aa0] process > join_results (1) [100%] 1 of 1 ✔ [4f/f929af] process > get_abundances (1) [100%] 1 of 1, failed: 1 ✘ [- ] process > plot_abundances - [fe/e2e60d] process > output_documentation [100%] 1 of 1 ✔ Execution cancelled -- Finishing pending tasks before exit [nf-core/nanoclust] Pipeline completed with errors Error executing process > 'get_abundances (1)'
Caused by: Process
get_abundances (1)
terminated with an error exit status (1)Command executed [/home/cnr-strep/NanoCLUST/templates/get_abundance.py]:
!/usr/bin/env python
import numpy as np import matplotlib.pyplot as plt from matplotlib import rc import pandas as pd from functools import reduce import requests import json
https://unipept.ugent.be/apidocs/taxonomy
def get_taxname(tax_id,tax_level): tags = {"S": "species_name","G": "genus_name","F": "family_name","O":'order_name', "C": "class_name"} tax_level_tag = tags[tax_level]
Avoids pipeline crash due to "nan" classification output. Thanks to Qi-Maria from Github
def get_abundance_values(names,paths): dfs = [] for name,path in zip(names,paths): data = pd.read_csv(path, index_col=False, sep=';').iloc[:,1:]
executor > local (23) [8b/15691c] process > QC (1) [100%] 1 of 1 ✔ [5e/36346c] process > fastqc (1) [100%] 1 of 1 ✔ [3c/3e9715] process > kmer_freqs (1) [100%] 1 of 1 ✔ [26/7c4085] process > read_clustering (1) [100%] 1 of 1 ✔ [3c/d03ee2] process > split_by_cluster (1) [100%] 1 of 1 ✔ [96/2c6b76] process > read_correction (3) [100%] 3 of 3 ✔ [bb/f3035a] process > draft_selection (3) [100%] 3 of 3 ✔ [21/1e894c] process > racon_pass (3) [100%] 3 of 3 ✔ [bf/28314d] process > medaka_pass (3) [100%] 3 of 3 ✔ [90/ad3c87] process > consensus_classification (3) [100%] 3 of 3 ✔ [07/d23aa0] process > join_results (1) [100%] 1 of 1 ✔ [4f/f929af] process > get_abundances (1) [100%] 1 of 1, failed: 1 ✘ [- ] process > plot_abundances - [fe/e2e60d] process > output_documentation [100%] 1 of 1 ✔ Execution cancelled -- Finishing pending tasks before exit [nf-core/nanoclust] Pipeline completed with errors WARN: Graphviz is required to render the execution DAG in the given format -- See http://www.graphviz.org for more info. Error executing process > 'get_abundances (1)'
Caused by: Process
get_abundances (1)
terminated with an error exit status (1)Command executed [/home/cnr-strep/NanoCLUST/templates/get_abundance.py]:
!/usr/bin/env python
import numpy as np import matplotlib.pyplot as plt from matplotlib import rc import pandas as pd from functools import reduce import requests import json
https://unipept.ugent.be/apidocs/taxonomy
def get_taxname(tax_id,tax_level): tags = {"S": "species_name","G": "genus_name","F": "family_name","O":'order_name', "C": "class_name"} tax_level_tag = tags[tax_level]
Avoids pipeline crash due to "nan" classification output. Thanks to Qi-Maria from Github
def get_abundance_values(names,paths): dfs = [] for name,path in zip(names,paths): data = pd.read_csv(path, index_col=False, sep=';').iloc[:,1:]
def merge_abundance(dfs,tax_level): df_final = reduce(lambda left,right: pd.merge(left,right,on='taxid',how='outer').fillna(0), dfs) df_final["taxid"] = [get_taxname(row["taxid"], tax_level) for index, row in df_final.iterrows()] df_final_grp = df_final.groupby(["taxid"], as_index=False).sum() return df_final_grp
def get_abundance(names,paths,tax_level): if(not isinstance(paths, list)): paths = [paths] names = [names]
paths = "barcode17.filtered.nanoclust_out.txt" names = "barcode17.filtered"
get_abundance(names,paths, "G") get_abundance(names,paths, "S") get_abundance(names,paths, "O") get_abundance(names,paths, "F")
Command exit status: 1
Command output: (empty)
Command error: Traceback (most recent call last): File ".command.sh", line 65, in
get_abundance(names,paths, "G")
File ".command.sh", line 59, in get_abundance
df_final_grp = merge_abundance(dfs, tax_level)
File ".command.sh", line 49, in merge_abundance
df_final["taxid"] = [get_taxname(row["taxid"], tax_level) for index, row in df_final.iterrows()]
File ".command.sh", line 49, in
df_final["taxid"] = [get_taxname(row["taxid"], tax_level) for index, row in df_final.iterrows()]
File ".command.sh", line 28, in get_taxname
return json.loads(complete_tax)[0][tax_level_tag]
IndexError: list index out of range
Work dir: /home/cnr-strep/NanoCLUST/work/4f/f929af73009d063bc5793e38804f62
Tip: view the complete command output by changing to the process work dir and entering the command
cat .command.out
(Nextflow) cnr-strep@cnrstrep-Precision-3660:~/NanoCLUST$Please, could you guide me on how to generate a database that can be interpreted by NanoCLUST from a FASTA file containing a list of selected 16S sequences?
Thank you very much!
Miguel Angel Hernandez