SilasK / Krak

Snakemake for flexitaxd and Kraken2
6 stars 3 forks source link

Error in rule braken #1

Open animesh opened 3 years ago

animesh commented 3 years ago

I am trying to run the workflow and facing this issue 2021-08-07T100724.612348.snakemake.log , more specifically, one of the fastq spec log S2_QUALITY_PASSED_species.log says "IndexError: string index out of range", any ideas how to get past this?

My config looks like config.zip where the specific changes made is db_name : "Kraken_dbs/UHGG"and genome_folder: fasta, not sure if this is relevant here?

Samples table samples.zip

SilasK commented 3 years ago

Check the kraken_results/db_name/reports/S2_QUALITY_PASSED.txt does it look like a kreken report. My guess is that i's empty. Maybe something during the kreken run went wrong.

SilasK commented 3 years ago

The rest seems all ok.

animesh commented 3 years ago

Yes, looks like all of them are empty? Should i just rerun the pipeline? Is there something else i can check before going that route?


(kraken) animeshs@DMED7596:/mnt/z/Kraken$ ls kraken_results/db_name/reports/*
kraken_results/db_name/reports/S20_2_QUALITY_PASSED.txt  kraken_results/db_name/reports/S33_QUALITY_PASSED.txt  kraken_results/db_name/reports/S46_QUALITY_PASSED.txt
kraken_results/db_name/reports/S23_QUALITY_PASSED.txt    kraken_results/db_name/reports/S35_QUALITY_PASSED.txt  kraken_results/db_name/reports/S47_QUALITY_PASSED.txt
kraken_results/db_name/reports/S27_QUALITY_PASSED.txt    kraken_results/db_name/reports/S36_QUALITY_PASSED.txt  kraken_results/db_name/reports/S48_QUALITY_PASSED.txt
kraken_results/db_name/reports/S28_QUALITY_PASSED.txt    kraken_results/db_name/reports/S37_QUALITY_PASSED.txt  kraken_results/db_name/reports/S5_QUALITY_PASSED.txt
kraken_results/db_name/reports/S2_QUALITY_PASSED.txt     kraken_results/db_name/reports/S3_QUALITY_PASSED.txt   kraken_results/db_name/reports/s13_QUALITY_PASSED.txt
kraken_results/db_name/reports/S30_QUALITY_PASSED.txt    kraken_results/db_name/reports/S40_QUALITY_PASSED.txt
kraken_results/db_name/reports/S31_QUALITY_PASSED.txt    kraken_results/db_name/reports/S44_QUALITY_PASSED.txt
(kraken) animeshs@DMED7596:/mnt/z/Kraken$ cat kraken_results/db_name/reports/*txt
SilasK commented 3 years ago

Check the "log/kraken/{db_name}/{sample}.log"

animesh commented 3 years ago

They seem to be complaining about gzip?

gzip: /mnt/z/ayu/reads/S20-2-QUALITY_PASSED_R1.fastq: not in gzip format

gzip: /mnt/z/ayu/reads/S20-2-QUALITY_PASSED_R2.fastq: not in gzip format
Loading database information... done.
0 sequences (0.00 Mbp) processed in 0.003s (0.0 Kseq/m, 0.00 Mbp/m).
  0 sequences classified (-nan%)
  0 sequences unclassified (-nan%)

gzip: /mnt/z/ayu/reads/S23-QUALITY_PASSED_R2.fastq: not in gzip format

gzip: /mnt/z/ayu/reads/S23-QUALITY_PASSED_R1.fastq: not in gzip format
Loading database information... done.
0 sequences (0.00 Mbp) processed in 0.012s (0.0 Kseq/m, 0.00 Mbp/m).
  0 sequences classified (-nan%)
  0 sequences unclassified (-nan%)

gzip: /mnt/z/ayu/reads/S27-QUALITY_PASSED_R1.fastq: not in gzip format

....

gzip: /mnt/z/ayu/reads/S27-QUALITY_PASSED_R2.fastq: not in gzip format
Loading database information... done.
0 sequences (0.00 Mbp) processed in 0.024s (0.0 Kseq/m, 0.00 Mbp/m).
  0 sequences classified (-nan%)
  0 sequences unclassified (-nan%)
SilasK commented 3 years ago

I think this is the error:

Chnge the kraken_run_extra: "--gzip-compressed " to kraken_run_extra: ""in the config.yaml and start anew.

animesh commented 3 years ago

Thanjks @SilasK that seems to have worked 2021-08-10T115315.338986.snakemake.log 👍🏽

with >85% reads classified:

cat log/kraken/*/*.log
Loading database information... done.
13080534 sequences (2616.11 Mbp) processed in 238.715s (3287.7 Kseq/m, 657.55 Mbp/m).
  11311337 sequences classified (86.47%)
  1769197 sequences unclassified (13.53%)
Loading database information... done.
2060576 sequences (412.12 Mbp) processed in 37.320s (3312.8 Kseq/m, 662.56 Mbp/m).
  1803574 sequences classified (87.53%)
  257002 sequences unclassified (12.47%)
...
Loading database information... done.
4282843 sequences (856.57 Mbp) processed in 79.849s (3218.2 Kseq/m, 643.64 Mbp/m).
  3834677 sequences classified (89.54%)
  448166 sequences unclassified (10.46%)

(kraken) animeshs@DMED7596:/mnt/z/Kraken$ ls -ltrh kraken_results/db_name/reports/*txt
-rwxrwxrwx 1 animeshs animeshs 307K Aug 10 12:01 kraken_results/db_name/reports/S27_QUALITY_PASSED.txt
-rwxrwxrwx 1 animeshs animeshs 295K Aug 10 12:01 kraken_results/db_name/reports/S27_QUALITY_PASSED_bracken_species.txt
-rwxrwxrwx 1 animeshs animeshs 254K Aug 10 12:03 kraken_results/db_name/reports/S5_QUALITY_PASSED.txt
-rwxrwxrwx 1 animeshs animeshs 244K Aug 10 12:03 kraken_results/db_name/reports/S5_QUALITY_PASSED_bracken_species.txt
-rwxrwxrwx 1 animeshs animeshs 304K Aug 10 12:06 kraken_results/db_name/reports/S31_QUALITY_PASSED.txt
-rwxrwxrwx 1 animeshs animeshs 293K Aug 10 12:06 kraken_results/db_name/reports/S31_QUALITY_PASSED_bracken_species.txt
-rwxrwxrwx 1 animeshs animeshs 313K Aug 10 12:13 kraken_results/db_name/reports/S33_QUALITY_PASSED.txt
-rwxrwxrwx 1 animeshs animeshs 301K Aug 10 12:13 kraken_results/db_name/reports/S33_QUALITY_PASSED_bracken_species.txt
-rwxrwxrwx 1 animeshs animeshs 294K Aug 10 12:16 kraken_results/db_name/reports/S3_QUALITY_PASSED.txt
-rwxrwxrwx 1 animeshs animeshs 283K Aug 10 12:16 kraken_results/db_name/reports/S3_QUALITY_PASSED_bracken_species.txt
-rwxrwxrwx 1 animeshs animeshs 311K Aug 10 12:22 kraken_results/db_name/reports/S47_QUALITY_PASSED.txt
-rwxrwxrwx 1 animeshs animeshs 299K Aug 10 12:22 kraken_results/db_name/reports/S47_QUALITY_PASSED_bracken_species.txt
-rwxrwxrwx 1 animeshs animeshs 312K Aug 10 12:27 kraken_results/db_name/reports/S37_QUALITY_PASSED.txt
-rwxrwxrwx 1 animeshs animeshs 300K Aug 10 12:27 kraken_results/db_name/reports/S37_QUALITY_PASSED_bracken_species.txt
-rwxrwxrwx 1 animeshs animeshs 313K Aug 10 12:33 kraken_results/db_name/reports/S23_QUALITY_PASSED.txt
-rwxrwxrwx 1 animeshs animeshs 301K Aug 10 12:33 kraken_results/db_name/reports/S23_QUALITY_PASSED_bracken_species.txt
-rwxrwxrwx 1 animeshs animeshs 313K Aug 10 12:39 kraken_results/db_name/reports/S40_QUALITY_PASSED.txt
-rwxrwxrwx 1 animeshs animeshs 300K Aug 10 12:39 kraken_results/db_name/reports/S40_QUALITY_PASSED_bracken_species.txt
-rwxrwxrwx 1 animeshs animeshs 308K Aug 10 12:44 kraken_results/db_name/reports/S48_QUALITY_PASSED.txt
-rwxrwxrwx 1 animeshs animeshs 296K Aug 10 12:44 kraken_results/db_name/reports/S48_QUALITY_PASSED_bracken_species.txt
-rwxrwxrwx 1 animeshs animeshs 314K Aug 10 12:51 kraken_results/db_name/reports/S44_QUALITY_PASSED.txt
-rwxrwxrwx 1 animeshs animeshs 302K Aug 10 12:52 kraken_results/db_name/reports/S44_QUALITY_PASSED_bracken_species.txt
-rwxrwxrwx 1 animeshs animeshs 277K Aug 10 12:54 kraken_results/db_name/reports/S46_QUALITY_PASSED.txt
-rwxrwxrwx 1 animeshs animeshs 266K Aug 10 12:54 kraken_results/db_name/reports/S46_QUALITY_PASSED_bracken_species.txt
-rwxrwxrwx 1 animeshs animeshs 289K Aug 10 12:57 kraken_results/db_name/reports/S36_QUALITY_PASSED.txt
-rwxrwxrwx 1 animeshs animeshs 278K Aug 10 12:57 kraken_results/db_name/reports/S36_QUALITY_PASSED_bracken_species.txt
-rwxrwxrwx 1 animeshs animeshs 296K Aug 10 13:02 kraken_results/db_name/reports/S30_QUALITY_PASSED.txt
-rwxrwxrwx 1 animeshs animeshs 284K Aug 10 13:02 kraken_results/db_name/reports/S30_QUALITY_PASSED_bracken_species.txt
-rwxrwxrwx 1 animeshs animeshs 314K Aug 10 13:09 kraken_results/db_name/reports/S35_QUALITY_PASSED.txt
-rwxrwxrwx 1 animeshs animeshs 302K Aug 10 13:09 kraken_results/db_name/reports/S35_QUALITY_PASSED_bracken_species.txt
-rwxrwxrwx 1 animeshs animeshs 281K Aug 10 13:11 kraken_results/db_name/reports/S2_QUALITY_PASSED.txt
-rwxrwxrwx 1 animeshs animeshs 270K Aug 10 13:11 kraken_results/db_name/reports/S2_QUALITY_PASSED_bracken_species.txt
-rwxrwxrwx 1 animeshs animeshs 310K Aug 10 13:16 kraken_results/db_name/reports/S28_QUALITY_PASSED.txt
-rwxrwxrwx 1 animeshs animeshs 298K Aug 10 13:16 kraken_results/db_name/reports/S28_QUALITY_PASSED_bracken_species.txt
-rwxrwxrwx 1 animeshs animeshs 277K Aug 10 13:17 kraken_results/db_name/reports/S20_2_QUALITY_PASSED.txt
-rwxrwxrwx 1 animeshs animeshs 266K Aug 10 13:17 kraken_results/db_name/reports/S20_2_QUALITY_PASSED_bracken_species.txt
-rwxrwxrwx 1 animeshs animeshs 286K Aug 10 13:20 kraken_results/db_name/reports/s13_QUALITY_PASSED.txt
-rwxrwxrwx 1 animeshs animeshs 275K Aug 10 13:20 kraken_results/db_name/reports/s13_QUALITY_PASSED_bracken_species.txt

is this fine-ish number to expect from a typical experiment? Also any downstream tools for this workflow you suggest, would be nice to know 💯

SilasK commented 3 years ago

@animesh For ways to analyze the counts output of Kraken I suggest aldex2 in R or this jupyter notebook. https://github.com/SilasK/CMGM/blob/main/notebooks/Analyze-cold-adapted-microbiota.ipynb

the counts there come from Kreken. You only need to replace the mouse database with the human.

However, if you want to use the functional inference it is better to have relative abundance which you get ideally from coverage and not coutns.

animesh commented 3 years ago

Thanks @SilasK :) I tried to run in on google-colab and i think it went fine with git pull of your repo and some installs (just made a pull request if you want to check?).

In general, what do you think about the 85-90% mapping, is this fine?

SilasK commented 3 years ago

Thank you. I check the PR. 80-90% mapping rate is good.