bxlab / metaWRAP

MetaWRAP - a flexible pipeline for genome-resolved metagenomic data analysis
MIT License
396 stars 190 forks source link

Problems with kraken 2 and rsync_from_ncbi.pl #479

Open haruosuz opened 1 year ago

haruosuz commented 1 year ago

Dear Sir or Madam:

I have got the following error messages:

Encountered problems while solving:
  - nothing provides requested kraken 2.0**
rsync_from_ncbi.pl: unexpected FTP path (new server?) for na

Details are as follows:

https://github.com/bxlab/metaWRAP/blob/master/README.md#installation

  1. Install all metaWRAP dependancies with conda:
(metawrap-env) metaWRAP -h

MetaWRAP v=1.3.2

Firstly, typing the following command

(metawrap-env) mamba install --only-deps -c ursky metawrap-mg

printed the following messages:

Pinned packages:
  - python 2.7.*

Encountered problems while solving:
  - nothing provides cairo 1.14.8.* needed by metawrap-mg-1.0-0

Secondly, typing the following command

# OR
mamba install biopython blas=2.5 blast=2.6.0 bmtagger bowtie2 bwa checkm-genome fastqc kraken=1.1 kraken=2.0 krona=2.7 matplotlib maxbin2 megahit metabat2 pandas prokka quast r-ggplot2 r-recommended salmon samtools=1.9 seaborn spades trim-galore

printed the following messages:

Pinned packages:
  - python 2.7.*

Encountered problems while solving:
  - nothing provides requested kraken 2.0**

Thirdly, I typed the following command (omitting kraken=2.0).

# OR
mamba install biopython blas=2.5 blast=2.6.0 bmtagger bowtie2 bwa checkm-genome fastqc kraken=1.1 krona=2.7 matplotlib maxbin2 megahit metabat2 pandas prokka quast r-ggplot2 r-recommended salmon samtools=1.9 seaborn spades trim-galore

https://github.com/bxlab/metaWRAP/blob/master/installation/database_installation.md#downloading-the-kraken1-standard-database Downloading the KRAKEN1 standard database:

Running the command on this page printed the following messages: standard output

Found jellyfish v1.1.12
Downloaded accession to taxon map(s)
Downloaded taxonomy tree data
Uncompressing taxonomy data... done.
Untarring taxonomy tree data... done.

standard error

rsync_from_ncbi.pl: unexpected FTP path (new server?) for https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/762/265/GCF_000762265.1_ASM76226v1

Following the comments on this github page, I changed the files as follows:

${HOME}/miniconda3/envs/mambaEnv/envs/metawrap-env/libexec/download_genomic_library.sh

33c33
< FTP_SERVER="ftp://$NCBI_SERVER"
---
> FTP_SERVER="https://$NCBI_SERVER"

${HOME}/miniconda3/envs/mambaEnv/envs/metawrap-env/libexec/rsync_from_ncbi.pl

37c37
<   if (! ($full_path =~ s#^ftp://ftp\.ncbi\.nlm\.nih\.gov/genomes/##)) {
---
>   if (! ($full_path =~ s#^https://ftp\.ncbi\.nlm\.nih\.gov/genomes/##)) {
55c55
<         s/^/ftp:\/\/ftp.ncbi.nlm.nih.gov\/genomes\//;
---
>         s/^/https:\/\/ftp.ncbi.nlm.nih.gov\/genomes\//;

Then, running the command

kraken-build --standard --threads 24 --db MY_KRAKEN_DATABASE

printed the following messages: standard output

Found jellyfish v1.1.12

standard error

Step 1/3: performing rsync dry run...
Rsync dry run complete, removing any non-existent files from manifest.
Step 2/3: Performing rsync file transfer of requested files
Rsync file transfer complete.
Step 3/3: Assigning taxonomic IDs to sequences
Processed 506 projects (910 sequences, 1.43 Gbp)... done.
All files processed, cleaning up extra sequence files... done, library complete.
rsync_from_ncbi.pl: unexpected FTP path (new server?) for na

I found the following two pages but could not find how to solve this problem.

Any comments will be gratefully appreciated.

yqy6611 commented 1 year ago

This is a bug of Kraken2. My approach is downloading genome files and taxonomy from NCBI first (Kraken2 just downloads genome files from NCBI ftp actually), then building kraken2 database following the instruction "build customized database".