Question over indicating the location of the nt database

smartise commented 5 days ago

hello,

first of all many thanks for the work and effort to pile this workflow. I am attending to use the pipeline however I fail to find where I can indicate where my already downloaded nt database is located so it avoid downloading a new one.

when I indicate in in the yaml document, it gives me back an error


Reading file /mnt/ebe/AmpliconSequencingONT/eDNA_nanopore_test/results/filtering/filtered.fasta 100%  
251878 nt in 361 seqs, min 308, max 1146, avg 698
Masking 100% 
Sorting by length 100%
Counting k-mers 100% 
Clustering 0%BLAST Database error: No alias or index file found for nucleotide database [/mnt/ebe/blobtools/nt] in search path [/srv/home/ocol0007/Natrix2::]
[Thu Nov 21 19:59:55 2024]
Error in rule blast:
    jobid: 61
    output: /mnt/ebe/AmpliconSequencingONT/eDNA_nanopore_test/results/blast/blast_taxonomy.tsv
    conda-env: /srv/home/ocol0007/Natrix2/.snakemake/conda/6470f7bb
    shell:
        blastn -num_threads 20 -query /mnt/ebe/AmpliconSequencingONT/eDNA_nanopore_test/results/filtering/filtered.fasta -db /mnt/ebe/blobtools/nt -max_target_seqs 10 -perc_identity 80.0 -evalue 1e-20 -outfmt "6 qseqid qlen length pident mismatch qstart qend sstart send gaps evalue staxid sseqid" -out /mnt/ebe/AmpliconSequencingONT/eDNA_nanopore_test/results/blast/blast_taxonomy.tsv
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job blast since they might be corrupted:
/mnt/ebe/AmpliconSequencingONT/eDNA_nanopore_test/results/blast/blast_taxonomy.tsv
Clustering 100%  
Sorting clusters 100%
Writing clusters 100% 
Clusters: 350 Size min 1, max 5, avg 1.0
Singletons: 342, 94.7% of seqs, 97.7% of clusters
Writing OTU table (classic) 100%  
[Thu Nov 21 19:59:56 2024]
Finished job 56.
56 of 66 steps (85%) done
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /srv/home/ocol0007/Natrix2/.snakemake/log/2024-11-21T194139.868399.snakemake.log
$

I assume the program is trying to compile a new one at the location but I don't see where I can indicate that it' already compiled and ready to use.

thanks

dusti1n commented 4 days ago

Hello,

thank you for describing the problem in detail, and thank you for the feedback on the pipeline—we’re glad to hear that you’re using it! The default path for the NCBI database in the pipeline is:

blast:
    blast: TRUE
    database: NCBI
    db_path: database/ncbi/nt

It would be great if you could share your configuration file for your samples so I can review it and ensure everything is set up correctly. To make sure everything runs smoothly, please also check that the following files are present in the specified directory: e.g., .nhd, .nhi, .nhr, .nin, .nnd, .nni, .nog, .nsq

These files are required for the pipeline to function as expected. If any are missing, you may need to re-download the database.

To avoid any further issues, we recommend using the default path (database/ncbi/nt) for the database in the pipeline and placing all required database files there. This ensures compatibility and reduces potential errors.

If you have any other questions or need further support, feel free to reach out anytime—we’re happy to assist!

Best regards, Dustin

smartise commented 4 days ago

here is the config file.

general:
        filename: /mnt/ebe/AmpliconSequencingONT/sard2_eDNA_natrix # The path / filename of the project folder, primertable (.csv) and configfile (.yaml). If the raw data folder is not in the root directory of Natrix, please add the path relative to the root directory (e.g. input/example_data)
        output_dir: /mnt/ebe/AmpliconSequencingONT/sard2_eDNA_natrix/results # Path to custom output directory / relative to the root directory of natrix. Do not use a dash in the folder name.
        primertable: Nanopore.csv # Path to the primertable. If the primertable is not in the root directory of Natrix, please add the path relative to the root directory (e.g. input/example_data.yaml)
        units: units.tsv # Path to the sequencing unit sheet. (name will be concatenated with output_dir)
        cores: 40 # Amount of cores available for the workflow.
        memory: 300000 # Available RAM in Mb.
        multiqc: TRUE # Initial quality check (fastqc & multiqc), currently only works for not yet assembled reads.
        demultiplexing: FALSE # Boolean, run demultiplexing for reads if they were not demultiplexed by the sequencing company (only Illumina support & slow).
        read_sorting: FALSE # Boolean, run read sorting for paired end reads if they were not sorted by the sequencing company (only Illumina support & slow).
        already_assembled: FALSE # Boolean, skip the quality control and read assembly steps for data if it is already assembled (only Illumina support).
        seq_rep: OTU # Type of sequence representative, possible values are: "ASV", amplicon sequence variants, created with DADA2 or "OTU", operational taxonomic units, created with SWARM or VSEARCH.

dataset:
        nanopore: TRUE # Boolean for the use of long sequences, e.g. Nanopore (TRUE) or short sequences, e.g. Illumina (FALSE).

#Quality check and primer removal for Nanopore
nanopore:
        quality_filt: 15 # Minimum Phred quality score.
        min_length: 50 # Minimum length of reads.
        max_length: 4000 # Maximum length of reads.
        head_trim: 0 # Trim N nucleotides from the start of reads.
        tail_trim: 0 # Trim N nucleotides from the end of reads.
        pychopper: TRUE # Boolean that indicates if pychopper should be used for reorientation, trimming and quality check of reads, if not done before.
        pychopqual: 7  #Minimum mean Q-score base quality for pychopper (default 7).
        racon: 4 #Iterations of racon for read correction. Possible values are 1, 2, 3, 4 or 5. The higher the quality of reads, the less iterations are required.

# Quality check and primer removal for Illumina
qc:
        threshold: 0.9 # PANDAseq score threshold a sequence must meet to be kept in the output.
        minoverlap: 15 # Sets the minimum overlap between forward and reverse reads.
        minqual: 1 # Minimal quality score for bases in an assembled read to be accepted by PANDAseq.
        minlen: 100 # The minimal length of a sequence after primer removal to be accepted by PANDAseq or Cutadapt.
        maxlen: 600 # The maximal length of a sequence after primer removal to be accepted by PANDAseq or Cutadapt.
        primer_offset: FALSE # Using PANDAseq to remove primer sequences by length offset instead of sequence identity, only for OTU variant.
        mq: 25 # Minimum quality sequence check (prinseq), filtering of sequences according to the PHRED quality score before the assembly.
        barcode_removed: TRUE # Boolean that indicates if the sequence is free of barcodes.
        all_primer: TRUE # Boolean that indicates if the sequence is free of any kind of additional subsequences (primer, barcodes etc.).

# Dereplication
derep:
        clustering: 1 # Percent identity for cdhit (dereplication) (1 = 100%), if cdhit is solely to be used for dereplication (recommended), keep the default value.
        length_overlap: 0.0 # Length difference cutoff, default 0.0 if set to 0.9, the shorter sequences need to be at least 90% length of the representative of the cluster.
        representative: most_common # Which sequence to use as a representative sequence per CDHIT cluster. longest = the longest sequence of the corresponding cluster, most_common = the most common sequence of the corresponding cluster.

# Chimera removal
chim:
        beta: 8.0 # Weight of a "no" vote for the VSEARCH chimera detection algorithm.
        pseudo_count: 1.2 # Pseudo - count prior on number of “no” votes.
        abskew: 16 # Minimum abundance skew, definied by (min(abund.(paren1), abund.(paren2))) / abund.(child).

# Merging
merge:
        filter_method: not_split # If the split sample approach was used (split_sample) or not (not_split). (Not recommended for Nanopore data, use "cutoff" instead.)
        ampliconduo: FALSE # Boolean, whether AmpliconDuo should be used for statistical analysis of the data.
        cutoff: 2 # An additional abundance filter if the split sample approach was not used. For a read to be kept, the sum of abundances over all samples needs to be above the cutoff.
        ampli_corr: fdr # Specifies the correction method for Fisher's exact test.
        save_format: png # File format for the frequency-frequency plot.
        plot_AmpDuo: TRUE # Boolean, whether the frequency-frequency plot should be saved.
        paired_End: FALSE # Boolean. Format of the sequencing data, TRUE if the reads are in paired-end format.
        name_ext: R1 # The identifier for the forward read (for the reverse read the 1 is switched with 2, if the data is in paired-end format), has to be included at the end of the file name, before the file format identifier (including for single end files).

# clustering
clustering: "vsearch" # Allows you to specify OTU clustering method to use. Your options are: swarm and vsearch. Nanopore only supports vsearch option.
vsearch_id: 0.97 #Percent identity for vsearch OTU clustering (1 = 100%).

# Postclustering
postcluster:
        mumu: FALSE  # Boolean for the use of MUMU, only for OTU clustering.

# Mothur parameter
classify:
        mothur: FALSE # Boolean for the use of mothur
        search: kmer # Allows you to specify the method to find most similar template. Your options are: suffix, kmer, blast, align and distance. The default is kmer
        method: wang # Allows you to specify classification method to use. Your options are: wang, knn and zap. The default is wang.
        database: pr2 # Database against which MOTHUR should be carried out, at the moment "pr2" , "unite" and "silva" are supported
database_version:
        pr2: 4.14.0
        silva: 138.1
database_path:
        silva_tax: database/silva_db.138.1.tax # Path for Silva taxonomy database
        silva_ref: database/silva_db.138.1.fasta # Path for Silva reference database
        pr2_ref: database/pr2db.4.14.0.fasta # Path for PR2 reference database
        pr2_tax: database/pr2db.4.14.0.tax # Path for PR2 taxonomy database
        unite_ref: database/unite_v10.fasta # Path for UNITE reference database
        unite_tax: database/unite_v10.tax # Path for UNITE taxonomy database

# BLAST
blast:
        blast: TRUE # Boolean to indicate the use of the BLAST search algorithm to assign taxonomic information to the OTUs.
        database: NCBI # Database against which the BLAST should be carried out, at the moment "NCBI" and "SILVA" are supported.
        drop_tax_classes: '' # Given a comma-separated list, drops undesired classes either by id, by name or using regex
        db_path: /mnt/ebe/blobtools/nt/nt # Path to the database file against which the BLAST should be carried out, at the moment only the SILVA (database/silva/silva.db) and NCBI (database/ncbi/nt) databases will be automatically downloaded.
        max_target_seqs: 10 # Number of NCBI blast hits that are saved per sequence / OTU.
        ident: 80.0 # Minimal identity overlap between target and query sequence. Set to lower threshold to be able to filter later by hand-
        evalue: 1e-20 # Highest accepted evalue. Set to higher threshold (e.g. 1e-5) to be able to filter later by hand.

unfortunately I have no control on where is the database as It is a cluster and have not the permit to change anything.

here is the database

(base) ocol0007@ebe-gpu01:/mnt/ebe/blobtools/nt$ ls
nt.00.nhd  nt.04.nsq  nt.09.nog  nt.14.nni  nt.19.nnd  nt.24.nin  nt.29.nhr  nt.34.nhi  nt.39.nhd  nt.43.nsq  nt.48.nog  nt.53.nni  nt.58.nnd  nt.63.nin  nt.68.nhr  nt.73.nhi  nt.78.nhd  nt.82.nsq  nt.87.nog
nt.00.nhi  nt.05.nhd  nt.09.nsq  nt.14.nog  nt.19.nni  nt.24.nnd  nt.29.nin  nt.34.nhr  nt.39.nhi  nt.44.nhd  nt.48.nsq  nt.53.nog  nt.58.nni  nt.63.nnd  nt.68.nin  nt.73.nhr  nt.78.nhi  nt.83.nhd  nt.87.nsq
nt.00.nhr  nt.05.nhi  nt.10.nhd  nt.14.nsq  nt.19.nog  nt.24.nni  nt.29.nnd  nt.34.nin  nt.39.nhr  nt.44.nhi  nt.49.nhd  nt.53.nsq  nt.58.nog  nt.63.nni  nt.68.nnd  nt.73.nin  nt.78.nhr  nt.83.nhi  nt.88.nhd
nt.00.nin  nt.05.nhr  nt.10.nhi  nt.15.nhd  nt.19.nsq  nt.24.nog  nt.29.nni  nt.34.nnd  nt.39.nin  nt.44.nhr  nt.49.nhi  nt.54.nhd  nt.58.nsq  nt.63.nog  nt.68.nni  nt.73.nnd  nt.78.nin  nt.83.nhr  nt.88.nhi
nt.00.nnd  nt.05.nin  nt.10.nhr  nt.15.nhi  nt.20.nhd  nt.24.nsq  nt.29.nog  nt.34.nni  nt.39.nnd  nt.44.nin  nt.49.nhr  nt.54.nhi  nt.59.nhd  nt.63.nsq  nt.68.nog  nt.73.nni  nt.78.nnd  nt.83.nin  nt.88.nhr
nt.00.nni  nt.05.nnd  nt.10.nin  nt.15.nhr  nt.20.nhi  nt.25.nhd  nt.29.nsq  nt.34.nog  nt.39.nni  nt.44.nnd  nt.49.nin  nt.54.nhr  nt.59.nhi  nt.64.nhd  nt.68.nsq  nt.73.nog  nt.78.nni  nt.83.nnd  nt.88.nin
nt.00.nog  nt.05.nni  nt.10.nnd  nt.15.nin  nt.20.nhr  nt.25.nhi  nt.30.nhd  nt.34.nsq  nt.39.nog  nt.44.nni  nt.49.nnd  nt.54.nin  nt.59.nhr  nt.64.nhi  nt.69.nhd  nt.73.nsq  nt.78.nog  nt.83.nni  nt.88.nnd
nt.00.nsq  nt.05.nog  nt.10.nni  nt.15.nnd  nt.20.nin  nt.25.nhr  nt.30.nhi  nt.35.nhd  nt.39.nsq  nt.44.nog  nt.49.nni  nt.54.nnd  nt.59.nin  nt.64.nhr  nt.69.nhi  nt.74.nhd  nt.78.nsq  nt.83.nog  nt.88.nni
nt.01.nhd  nt.05.nsq  nt.10.nog  nt.15.nni  nt.20.nnd  nt.25.nin  nt.30.nhr  nt.35.nhi  nt.40.nhd  nt.44.nsq  nt.49.nog  nt.54.nni  nt.59.nnd  nt.64.nin  nt.69.nhr  nt.74.nhi  nt.79.nhd  nt.83.nsq  nt.88.nog
nt.01.nhi  nt.06.nhd  nt.10.nsq  nt.15.nog  nt.20.nni  nt.25.nnd  nt.30.nin  nt.35.nhr  nt.40.nhi  nt.45.nhd  nt.49.nsq  nt.54.nog  nt.59.nni  nt.64.nnd  nt.69.nin  nt.74.nhr  nt.79.nhi  nt.84.nhd  nt.88.nsq
nt.01.nhr  nt.06.nhi  nt.11.nhd  nt.15.nsq  nt.20.nog  nt.25.nni  nt.30.nnd  nt.35.nin  nt.40.nhr  nt.45.nhi  nt.50.nhd  nt.54.nsq  nt.59.nog  nt.64.nni  nt.69.nnd  nt.74.nin  nt.79.nhr  nt.84.nhi  nt.89.nhd
nt.01.nin  nt.06.nhr  nt.11.nhi  nt.16.nhd  nt.20.nsq  nt.25.nog  nt.30.nni  nt.35.nnd  nt.40.nin  nt.45.nhr  nt.50.nhi  nt.55.nhd  nt.59.nsq  nt.64.nog  nt.69.nni  nt.74.nnd  nt.79.nin  nt.84.nhr  nt.89.nhi
nt.01.nnd  nt.06.nin  nt.11.nhr  nt.16.nhi  nt.21.nhd  nt.25.nsq  nt.30.nog  nt.35.nni  nt.40.nnd  nt.45.nin  nt.50.nhr  nt.55.nhi  nt.60.nhd  nt.64.nsq  nt.69.nog  nt.74.nni  nt.79.nnd  nt.84.nin  nt.89.nhr
nt.01.nni  nt.06.nnd  nt.11.nin  nt.16.nhr  nt.21.nhi  nt.26.nhd  nt.30.nsq  nt.35.nog  nt.40.nni  nt.45.nnd  nt.50.nin  nt.55.nhr  nt.60.nhi  nt.65.nhd  nt.69.nsq  nt.74.nog  nt.79.nni  nt.84.nnd  nt.89.nin
nt.01.nog  nt.06.nni  nt.11.nnd  nt.16.nin  nt.21.nhr  nt.26.nhi  nt.31.nhd  nt.35.nsq  nt.40.nog  nt.45.nni  nt.50.nnd  nt.55.nin  nt.60.nhr  nt.65.nhi  nt.70.nhd  nt.74.nsq  nt.79.nog  nt.84.nni  nt.89.nnd
nt.01.nsq  nt.06.nog  nt.11.nni  nt.16.nnd  nt.21.nin  nt.26.nhr  nt.31.nhi  nt.36.nhd  nt.40.nsq  nt.45.nog  nt.50.nni  nt.55.nnd  nt.60.nin  nt.65.nhr  nt.70.nhi  nt.75.nhd  nt.79.nsq  nt.84.nog  nt.89.nni
nt.02.nhd  nt.06.nsq  nt.11.nog  nt.16.nni  nt.21.nnd  nt.26.nin  nt.31.nhr  nt.36.nhi  nt.41.nhd  nt.45.nsq  nt.50.nog  nt.55.nni  nt.60.nnd  nt.65.nin  nt.70.nhr  nt.75.nhi  nt.80.nhd  nt.84.nsq  nt.89.nog
nt.02.nhi  nt.07.nhd  nt.11.nsq  nt.16.nog  nt.21.nni  nt.26.nnd  nt.31.nin  nt.36.nhr  nt.41.nhi  nt.46.nhd  nt.50.nsq  nt.55.nog  nt.60.nni  nt.65.nnd  nt.70.nin  nt.75.nhr  nt.80.nhi  nt.85.nhd  nt.89.nsq
nt.02.nhr  nt.07.nhi  nt.12.nhd  nt.16.nsq  nt.21.nog  nt.26.nni  nt.31.nnd  nt.36.nin  nt.41.nhr  nt.46.nhi  nt.51.nhd  nt.55.nsq  nt.60.nog  nt.65.nni  nt.70.nnd  nt.75.nin  nt.80.nhr  nt.85.nhi  nt.nal
nt.02.nin  nt.07.nhr  nt.12.nhi  nt.17.nhd  nt.21.nsq  nt.26.nog  nt.31.nni  nt.36.nnd  nt.41.nin  nt.46.nhr  nt.51.nhi  nt.56.nhd  nt.60.nsq  nt.65.nog  nt.70.nni  nt.75.nnd  nt.80.nin  nt.85.nhr  nt.ndb
nt.02.nnd  nt.07.nin  nt.12.nhr  nt.17.nhi  nt.22.nhd  nt.26.nsq  nt.31.nog  nt.36.nni  nt.41.nnd  nt.46.nin  nt.51.nhr  nt.56.nhi  nt.61.nhd  nt.65.nsq  nt.70.nog  nt.75.nni  nt.80.nnd  nt.85.nin  nt.nos
nt.02.nni  nt.07.nnd  nt.12.nin  nt.17.nhr  nt.22.nhi  nt.27.nhd  nt.31.nsq  nt.36.nog  nt.41.nni  nt.46.nnd  nt.51.nin  nt.56.nhr  nt.61.nhi  nt.66.nhd  nt.70.nsq  nt.75.nog  nt.80.nni  nt.85.nnd  nt.not
nt.02.nog  nt.07.nni  nt.12.nnd  nt.17.nin  nt.22.nhr  nt.27.nhi  nt.32.nhd  nt.36.nsq  nt.41.nog  nt.46.nni  nt.51.nnd  nt.56.nin  nt.61.nhr  nt.66.nhi  nt.71.nhd  nt.75.nsq  nt.80.nog  nt.85.nni  nt.ntf
nt.02.nsq  nt.07.nog  nt.12.nni  nt.17.nnd  nt.22.nin  nt.27.nhr  nt.32.nhi  nt.37.nhd  nt.41.nsq  nt.46.nog  nt.51.nni  nt.56.nnd  nt.61.nin  nt.66.nhr  nt.71.nhi  nt.76.nhd  nt.80.nsq  nt.85.nog  nt.nto
nt.03.nhd  nt.07.nsq  nt.12.nog  nt.17.nni  nt.22.nnd  nt.27.nin  nt.32.nhr  nt.37.nhi  nt.42.nhd  nt.46.nsq  nt.51.nog  nt.56.nni  nt.61.nnd  nt.66.nin  nt.71.nhr  nt.76.nhi  nt.81.nhd  nt.85.nsq  taxdb.btd
nt.03.nhi  nt.08.nhd  nt.12.nsq  nt.17.nog  nt.22.nni  nt.27.nnd  nt.32.nin  nt.37.nhr  nt.42.nhi  nt.47.nhd  nt.51.nsq  nt.56.nog  nt.61.nni  nt.66.nnd  nt.71.nin  nt.76.nhr  nt.81.nhi  nt.86.nhd  taxdb.bti
nt.03.nhr  nt.08.nhi  nt.13.nhd  nt.17.nsq  nt.22.nog  nt.27.nni  nt.32.nnd  nt.37.nin  nt.42.nhr  nt.47.nhi  nt.52.nhd  nt.56.nsq  nt.61.nog  nt.66.nni  nt.71.nnd  nt.76.nin  nt.81.nhr  nt.86.nhi
nt.03.nin  nt.08.nhr  nt.13.nhi  nt.18.nhd  nt.22.nsq  nt.27.nog  nt.32.nni  nt.37.nnd  nt.42.nin  nt.47.nhr  nt.52.nhi  nt.57.nhd  nt.61.nsq  nt.66.nog  nt.71.nni  nt.76.nnd  nt.81.nin  nt.86.nhr
nt.03.nnd  nt.08.nin  nt.13.nhr  nt.18.nhi  nt.23.nhd  nt.27.nsq  nt.32.nog  nt.37.nni  nt.42.nnd  nt.47.nin  nt.52.nhr  nt.57.nhi  nt.62.nhd  nt.66.nsq  nt.71.nog  nt.76.nni  nt.81.nnd  nt.86.nin
nt.03.nni  nt.08.nnd  nt.13.nin  nt.18.nhr  nt.23.nhi  nt.28.nhd  nt.32.nsq  nt.37.nog  nt.42.nni  nt.47.nnd  nt.52.nin  nt.57.nhr  nt.62.nhi  nt.67.nhd  nt.71.nsq  nt.76.nog  nt.81.nni  nt.86.nnd
nt.03.nog  nt.08.nni  nt.13.nnd  nt.18.nin  nt.23.nhr  nt.28.nhi  nt.33.nhd  nt.37.nsq  nt.42.nog  nt.47.nni  nt.52.nnd  nt.57.nin  nt.62.nhr  nt.67.nhi  nt.72.nhd  nt.76.nsq  nt.81.nog  nt.86.nni
nt.03.nsq  nt.08.nog  nt.13.nni  nt.18.nnd  nt.23.nin  nt.28.nhr  nt.33.nhi  nt.38.nhd  nt.42.nsq  nt.47.nog  nt.52.nni  nt.57.nnd  nt.62.nin  nt.67.nhr  nt.72.nhi  nt.77.nhd  nt.81.nsq  nt.86.nog
nt.04.nhd  nt.08.nsq  nt.13.nog  nt.18.nni  nt.23.nnd  nt.28.nin  nt.33.nhr  nt.38.nhi  nt.43.nhd  nt.47.nsq  nt.52.nog  nt.57.nni  nt.62.nnd  nt.67.nin  nt.72.nhr  nt.77.nhi  nt.82.nhd  nt.86.nsq
nt.04.nhi  nt.09.nhd  nt.13.nsq  nt.18.nog  nt.23.nni  nt.28.nnd  nt.33.nin  nt.38.nhr  nt.43.nhi  nt.48.nhd  nt.52.nsq  nt.57.nog  nt.62.nni  nt.67.nnd  nt.72.nin  nt.77.nhr  nt.82.nhi  nt.87.nhd
nt.04.nhr  nt.09.nhi  nt.14.nhd  nt.18.nsq  nt.23.nog  nt.28.nni  nt.33.nnd  nt.38.nin  nt.43.nhr  nt.48.nhi  nt.53.nhd  nt.57.nsq  nt.62.nog  nt.67.nni  nt.72.nnd  nt.77.nin  nt.82.nhr  nt.87.nhi
nt.04.nin  nt.09.nhr  nt.14.nhi  nt.19.nhd  nt.23.nsq  nt.28.nog  nt.33.nni  nt.38.nnd  nt.43.nin  nt.48.nhr  nt.53.nhi  nt.58.nhd  nt.62.nsq  nt.67.nog  nt.72.nni  nt.77.nnd  nt.82.nin  nt.87.nhr
nt.04.nnd  nt.09.nin  nt.14.nhr  nt.19.nhi  nt.24.nhd  nt.28.nsq  nt.33.nog  nt.38.nni  nt.43.nnd  nt.48.nin  nt.53.nhr  nt.58.nhi  nt.63.nhd  nt.67.nsq  nt.72.nog  nt.77.nni  nt.82.nnd  nt.87.nin
nt.04.nni  nt.09.nnd  nt.14.nin  nt.19.nhr  nt.24.nhi  nt.29.nhd  nt.33.nsq  nt.38.nog  nt.43.nni  nt.48.nnd  nt.53.nin  nt.58.nhr  nt.63.nhi  nt.68.nhd  nt.72.nsq  nt.77.nog  nt.82.nni  nt.87.nnd
nt.04.nog  nt.09.nni  nt.14.nnd  nt.19.nin  nt.24.nhr  nt.29.nhi  nt.34.nhd  nt.38.nsq  nt.43.nog  nt.48.nni  nt.53.nnd  nt.58.nin  nt.63.nhr  nt.68.nhi  nt.73.nhd  nt.77.nsq  nt.82.nog  nt.87.nni

dusti1n commented 1 day ago

Hello @smartise ,

Thank you for sharing your configuration file!

It seems the issue might be with the db_path setting. Currently, it is:

db_path: /mnt/ebe/blobtools/nt/nt

From the files you provided, it looks like the database files are in /mnt/ebe/blobtools/nt. Please update the db_path to:

db_path: /mnt/ebe/blobtools/nt

The extra /nt might be causing the issue. Additionally, it’s possible that not all required files are present, in which case I recommend re-downloading the database automatically within the workflow.

If the issue persists, using the default path for the database and letting the workflow download it automatically might also resolve the problem.

Let me know if you need further assistance!

Best regards, Dustin

smartise commented 1 day ago

hello, thanks for you reply.

unfortunately the error remains the same.

[Mon Nov 25 21:44:17 2024]
Error in rule blast:
    jobid: 34
    output: /mnt/ebe/AmpliconSequencingONT/natrix_test/results/blast/blast_taxonomy.tsv
    conda-env: /srv/home/ocol0007/Natrix2/.snakemake/conda/8369bbc2
    shell:
        blastn -num_threads 40 -query /mnt/ebe/AmpliconSequencingONT/natrix_test/results/filtering/filtered.fasta -db /mnt/ebe/blobtools/nt -max_target_seqs 10 -perc_identity 90.0 -evalue 1e-20 -outfmt "6 qseqid qlen length pident mismatch qstart qend sstart send gaps evalue staxid sseqid" -out /mnt/ebe/AmpliconSequencingONT/natrix_test/results/blast/blast_taxonomy.tsv
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job blast since they might be corrupted:
/mnt/ebe/AmpliconSequencingONT/natrix_test/results/blast/blast_taxonomy.tsv
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /srv/home/ocol0007/Natrix2/.snakemake/log/2024-11-25T214007.137268.snakemake.log
$

however when I do the blast separately from the pipeline, it does work. with the working directory /mnt/ebe/blobtools/nt/nt

dusti1n commented 11 hours ago

Hello @smartise,

I have reviewed this process, and the safest approach is to let Natrix2 download the database completely and correctly. This ensures that all required input and output files are in place, avoiding potential errors. I hope this clarifies why the current setup doesn’t work. It’s best to let the pipeline run fully once. This way, you can reuse it for various samples without needing to download the database again.

Please using the standard db_path:

db_path: database/ncbi/nt

This will check if the database download works properly. Natrix2 processes the database as follows: The pipeline searches the specified db_path for packed database files, such as nt.000.tar.gz, nt.001.tar.gz, etc. If these packed files are found, Natrix2 unpacks them. Each .tar.gz file contains the following database components:

.nhd
.nhi
.nhr
.nin
.nnd
.nni
.nog
.nsq
Additional files like taxdb.btd, taxdb.bti, and taxonomy4blast.sqlite3.

If the packed files are not found, Natrix2 automatically downloads them, unpacks them, and verifies their content.

What you can do is to manually continue the analysis for your samples using BLAST, as you have already tried. Natrix2, however, requires the exact input and output files to ensure the pipeline runs without errors.

Feel free to reach out if you have any further questions or need assistance. Let me know if your pipeline setup works correctly!

Best regards, dustin

dbeisser / Natrix2

Question over indicating the location of the nt database #26