DennisSchmitz / Jovian_archive

Metagenomics/viromics pipeline that focuses on automation, user-friendliness and a clear audit trail. Jovian aims to empower classical biologists and wet-lab personnel to do metagenomics/viromics analyses themselves, without bioinformatics expertise.
GNU Affero General Public License v3.0
18 stars 7 forks source link

Heatmap script only works if all expected taxa are in dataset #12

Closed samnooij closed 5 years ago

samnooij commented 5 years ago

The current heatmap script can only work if there are e.g. viruses in the final taxonomic classifications. If there are none, the Python script cannot draw the heatmaps and Snakemake will see that not all output for the rule "draw_heatmaps" can be generated. Therefore, we need some checks and work-around for datasets that do not have any of the expected taxa. (E.g. have the Python script create empty files for unobserved taxa and write a little warning to the terminal/a log file?)


My current goal to make this work better would be to:

  1. Have the user/snakemake control which heatmaps to make
  2. Create only one HTML file per taxon, insert tabs or panels for different ranks. (Write a warning to the user to inform of absent taxa, but still create a file to be able to finish the pipeline.)
  3. Reduce file size and improve file usefulness by aggregating contig information into one mouse-over/hover panel (e.g. "10 contigs found for this taxon, average depth of coverage: 2, contigs lengths: 1000 - 5000)

Suggested solution

Remake the script and take into account:

(Bold = priority, other points = of secondary importance.)

_Note: these solutions need changes in the Python script itself, the Snakefile, and possibly also the pipelineparameters.yaml file!

thierryjanssens commented 5 years ago

Hi,

I experience troubles with this rule as well, probably because of the apparent absence of Archaea in my data. Could it be that the quantify_output is also hampered by the same cause? Is there a relationshio with the Concat_files rule? Concat_files.log is empty, jut like draw_heatmaps.log and qunatify_output.log

This is my error (at the very last bit of the workflow):

[Fri Apr 19 11:04:23 2019]=============----------------------] 62.5% - Reading files [ 45 / 72 ] Error in rule draw_heatmaps: jobid: 268 output: results/heatmaps/Superkingdoms_heatmap.html, results/heatmaps/Virus_order_heatmap.html, results/heatmaps/Virus_family_heatmap.html, results/heatmaps/Virus_genus_heatmap.html, results/heatmaps/Virus_species_heatmap.html, results/heatmaps/Phage_order_heatmap.html, results/heatmaps/Phage_family_heatmap.html, results/heatmaps/Phage_genus_heatmap.html, results/heatmaps/Phage_species_heatmap.html, results/heatmaps/Bacteria_phylum_heatmap.html, results/heatmaps/Bacteria_class_heatmap.html, results/heatmaps/Bacteria_order_heatmap.html, results/heatmaps/Bacteria_family_heatmap.html, results/heatmaps/Bacteria_genus_heatmap.html, results/heatmaps/Bacteria_species_heatmap.html, results/Taxonomic_rank_statistics.tsv, results/Virus_rank_statistics.tsv, results/Phage_rank_statistics.tsv, results/Bacteria_rank_statistics.tsv log: logs/draw_heatmaps.log conda-env: /data/BioGrid/ERVINGS/Runs_Respiratory_MiSEQ_RIVM/MiSeq_RUN_12APR2019/Jovian/.snakemake/conda/9d3d5b4d

ClusterJobException in line 687 of /data/BioGrid/ERVINGS/Runs_Respiratory_MiSEQ_RIVM/MiSeq_RUN_12APR2019/Jovian/Snakefile: Error executing rule draw_heatmaps on cluster (jobid: 268, external: 225321, jobscript: /data/BioGrid/ERVINGS/Runs_Respiratory_MiSEQ_RIVM/MiSeq_RUN_12APR2019/Jovian/.snakemake/tmp.vw572phx/Jovian_draw_heatmaps.jobid268). For detailed error see the cluster log. Job failed, going on with independent jobs.------------------] 70.8% - Reading files [ 51 / 72 ] Done counting!===============================================] 100.0% - Reading files [ 72 / 72 ] Traceback (most recent call last): File "/data/BioGrid/ERVINGS/Runs_Respiratory_MiSEQ_RIVM/MiSeq_RUN_12APR2019/Jovian/.snakemake/scripts/tmpecjkhsji.quantify_profiles.py", line 487, in main() File "/data/BioGrid/ERVINGS/Runs_Respiratory_MiSEQ_RIVM/MiSeq_RUN_12APR2019/Jovian/.snakemake/scripts/tmpecjkhsji.quantify_profiles.py", line 421, in main "Eukaryota", "Viruses", "Unclassified" ]] File "/data/BioGrid/ERVINGS/Runs_Respiratory_MiSEQ_RIVM/MiSeq_RUN_12APR2019/Jovian/.snakemake/conda/9d3d5b4d/lib/python3.7/site-packages/pandas/core/frame.py", line 2682, in getitem return self._getitem_array(key) File "/data/BioGrid/ERVINGS/Runs_Respiratory_MiSEQ_RIVM/MiSeq_RUN_12APR2019/Jovian/.snakemake/conda/9d3d5b4d/lib/python3.7/site-packages/pandas/core/frame.py", line 2726, in _getitem_array indexer = self.loc._convert_to_indexer(key, axis=1) File "/data/BioGrid/ERVINGS/Runs_Respiratory_MiSEQ_RIVM/MiSeq_RUN_12APR2019/Jovian/.snakemake/conda/9d3d5b4d/lib/python3.7/site-packages/pandas/core/indexing.py", line 1327, in _convert_to_indexer .format(mask=objarr[mask])) KeyError: "['Archaea'] not in index" [Fri Apr 19 11:04:39 2019] Error in rule quantify_output: jobid: 269 output: results/profile_read_counts.csv, results/profile_percentages.csv, results/Sample_composition_graph.html log: logs/quantify_output.log conda-env: /data/BioGrid/ERVINGS/Runs_Respiratory_MiSEQ_RIVM/MiSeq_RUN_12APR2019/Jovian/.snakemake/conda/9d3d5b4d

RuleException: CalledProcessError in line 681 of /data/BioGrid/ERVINGS/Runs_Respiratory_MiSEQ_RIVM/MiSeq_RUN_12APR2019/Jovian/Snakefile: Command 'source /mnt/miniconda/bin/activate '/data/BioGrid/ERVINGS/Runs_Respiratory_MiSEQ_RIVM/MiSeq_RUN_12APR2019/Jovian/.snakemake/conda/9d3d5b4d'; set -euo pipefail; python /data/BioGrid/ERVINGS/Runs_Respiratory_MiSEQ_RIVM/MiSeq_RUN_12APR2019/Jovian/.snakemake/scripts/tmpecjkhsji.quantify_profiles.py' returned non-zero exit status 1. File "/data/BioGrid/ERVINGS/Runs_Respiratory_MiSEQ_RIVM/MiSeq_RUN_12APR2019/Jovian/Snakefile", line 681,in __rule_quantify_output File "/home/janssetk/.conda/envs/Jovian_master/lib/python3.6/concurrent/futures/thread.py", line 56, in run Job failed, going on with independent jobs. Exiting because a job execution failed. Look above for error message Complete log: /data/BioGrid/ERVINGS/Runs_Respiratory_MiSEQ_RIVM/MiSeq_RUN_12APR2019/Jovian/.snakemake/log/2019-04-19T110018.225554.snakemake.log

DennisSchmitz commented 5 years ago

This is also the cause of issue #26

DennisSchmitz commented 5 years ago

Fixed in v0.9.2 which will be made available this afternoon. Closing.