Closed DennisSchmitz closed 4 years ago
Update: Also happens on the viral encephalitis dataset.
Same reason.
After some extra searching through the files and datasets we've found that this is actually not caused by the lack of "background organism"-reads. Both datasets are able to complete all jobs related to deleting the background-organism reads without issues.
What happens with the particular ENNGS datasets is caused by the actual size of the datasets combined with the current 'strict'-mode of Jovian.
Jovian in strict mode requires that assembled scaffolds are at least 500 nucleotides long. However, the assembled sequences in these datasets are (often) smaller than 500nt, resulting in an empty fasta output file that is passed onto the downstream processes. Which obviously results in a crash.
Running this dataset in relaxed-mode "solves" the issue for this particular dataset.
As earlier noted, this falls outside of the intended use case of Jovian
Dataset: ENNGS "Viral_metagenomics" dataset
The ENNGS data is already cleaned of HuGo reads.
Jovian
assumes that there are at least some reads of the specified background organism. In this dataset, there aren't, so the required output files are not generated. This results in a unresolvable DAG forSnakemake
which then crashes. Basically, one of the assumptionsJovian
makes isn't true.I have to develop a workaround for it, maybe
touch
ing the files?... But that might cause problems inMultiQC
. Maybe the background organism workflow should be made optional, like themgkit
LCA method. However, sinceJovian
is intended for raw and unedited Illumina data, this falls outside the intended use-case, therefore I'm giving it low priority.