Closed charlesreid1 closed 6 years ago
This step was working fine 1 month ago. Also, 1 month ago the URL to the kaiju database file led directly to the kaiju database file (kaiju_index_nr_euk.tgz
). That link was http://kaiju.binf.ku.dk/database/kaiju_index_nr_euk.tgz
However, since that time, the link above now returns an HTTP 301 (redirect) for the database file. The kaiju database file definitely moved since the script was last working; is it possible this is the cause of the script not working (malformed or different data in the .tgz file)? Or is this a gold standard and not something we would expect to change?
I do not have a copy of the database or a signature of it from 1 mo. ago to check if it is different (note: we should include MD5 sums of data in the walkthroughs/documentation/example workflows).
I have downloaded the current version of kaiju_index_nr_euk.tgz
and put it in an AWS S3 bucket, for the reasons mentioned above (faster/more polite) and also to make sure we have a version of it. That link is https://s3.amazonaws.com/dahak-project-ucdavis/kaiju/kaiju_index_nr_euk.tgz
I also tried running the commands interactively from the singularity and docker containers (i.e., getting an interactive shell and copying and pasting the command to make sure the files existed and I didn't have any syntax wrong). These all resulted in Segmentation Faults.
I also tried running with other data files (using different files for -i
and -j
), for example:
-i /data/SRR606249_subset10_1.trim2.fq.gz \
-j /data/SRR606249_subset10_2.trim2.fq.gz \
with the same outcome - Segmentation Fault.
Did not try a different version of kaiju (this worked before with this version, so in principle it should still work).
~Apparently this is a known issue - kaiju databases are occasionally updated to non-working states.~
Plan:
Due to frequent updates in database location or incomplete downloads accompanied by the lack of an error message indicating that the database is the issue it may be useful to 1) include an error message in the workflows indicating the user should redownload the database if the next step fails or 2) force redownload if the next step fails. For the duration of the project it may be worth it to host the database on amazon (@ctb).
On Tue, Jun 12, 2018 at 08:18:51PM +0000, Chaz Reid wrote:
Apparently this is a known issue - kaiju databases are occasionally updated to non-working states.
wat
Revised:
kaiju databases are occasionally downloaded in non-working states
Update:
Expected behavior
Running kaiju should create a kaiju output file. See the run kaiju step in the walkthrough (also on the snakemake branch of PR #83 here).
Actual behavior
A segmentation fault happens whether the kaiju command is run through Docker or Singularity (using the given commands below). Tested on AWS node, Ubuntu 16.04 Xenial image. Using kaiju version:
Steps to reproduce the behavior
(Note that the URL curl -O https://s3.amazonaws.com/dahak-project-ucdavis/kaiju/kaiju_index_nr_euk.tgz is an S3 bucket with the Kaiju .tgz file, which is faster and more polite to download from a bucket than to always download from Kaiju's servers).
(alternatively, you can also use a docker command, following the walkthrough or the dockerSnakefile in the
snakemake
branch linked to above).The kaiju program starts, and runs for a minute or two, but always ends with a Segmentation Fault.
This is a bit tricky to debug, given that it depends on so many files, but do you see anything fishy about the kaiju command?
Output from snakemake log: