Open 1ndy opened 3 months ago
A quick implementation of the -with-dag
parameter for module file 01_est_sequence_blast.sh
. As far as I can tell, the resulting mermaid graph is not immediately portable into the documentation. See below:
flowchart TB
v0([get_sequence_ids])
subgraph " "
v1[" "]
v2[" "]
v3[" "]
v4[" "]
v18[" "]
v19[" "]
v21[" "]
end
v5([split_sequence_ids])
v7([get_sequences])
v9([cat_fasta_files])
v10([create_blast_db])
v11([blastreduce_transcode_fasta])
v12([split_fasta])
v14([all_by_all_blast])
v16([blastreduce])
v17([compute_stats])
v20([visualize])
v6(( ))
v8(( ))
v13(( ))
v15(( ))
v0 --> v5
v0 --> v4
v0 --> v3
v0 --> v2
v0 --> v1
v5 --> v6
v6 --> v7
v7 --> v8
v8 --> v9
v9 --> v10
v9 --> v11
v9 --> v12
v10 --> v14
v11 --> v16
v11 --> v17
v12 --> v13
v13 --> v14
v14 --> v15
v15 --> v16
v16 --> v17
v17 --> v20
v17 --> v19
v17 --> v18
v20 --> v21
By default, the file type written by -with-dag
is an html file (great! I can copy the body into this message. very nice). BUT, since the single test only goes through one branch of the est.nf
pipeline, the DAG doesn't include the other branches' flow(s). So, we can't automatically port one test's DAG into the docs or, at least, not as I imagined.
Also, these DAGs don't include much in the way of information for human readers; branches in the workflows won't include labels (e.g. sequence blast vs fasta vs family vs accession and the respective A,B,C,D notation used by devs). Outputs from the workflows are just empty boxes, which is zero or negative information content. Much left to be desired.
Oh, on visual inspection of the actual steps in the above DAG, nextflow is not creating a correct graph for the "blast" branch of the est.nf
file. I'm not sure if an automated parsing of nextflow-created DAGs for each branch would even be worthwhile since inaccuracies would be hard to detect.
---
title: EST
---
flowchart TB
v0([Input parameters defined in params.yml])
v0a((if params.import_mode == 'fasta'))
v0b((else))
v1([import_fasta])
v0 --> v0a
v0 --> v0b
subgraph " "
v0a --> v1
end
subgraph " "
v2([get_sequence_ids])
v3([split_sequence_ids])
v4([get_sequences])
v5([cat_fasta_files])
v0b --> v2
v2 --> v3
v3 --> v4
v4 --> v5
end
v6((if params.multiplex))
v7([multiplex])
v8([create_blast_db])
v1 --> v6
v1 --> v8
v5 --> v6
v6 --> v7
v7 --> v8
v5 --> v8
v9([blastreduce_transcode_fasta])
v8 --> v9
v10([split_fasta])
v11([all_by_all_blast])
v12([blastreduce])
v9 --> v10
v10 --> v11
v11 --> v12
v13((if params.multiplex))
v14([demultiplex])
v15([compute_stats])
v12 --> v13
v13 --> v14
v14 --> v15
v12 --> v15
v16([visualize])
v15 --> v16
Created this by hand. Pretty easy to edit now that the backbone is ready.
I'm doing a bit of fine-tuning. This is certainly not something that the automated nextflow -with-dag
parameter will output.
Here's the updated visualization for the est.nf pipeline:
---
config:
look: classic
theme: forest
---
flowchart TB
start((start))
v0[\Input parameters defined in params.yml\]
start --> v0
subgraph " "
v0a{if params.import_mode == 'fasta'}
v1(import_fasta)
v2(get_sequence_ids)
v3(split_sequence_ids)
v4(get_sequences)
v5(cat_fasta_files)
v0 --> v0a
v0a -->|true| v1
v0a -->|false| v2
v2 --> v3
v3 --> v4
v4 --> v5
end
subgraph " "
v6{if params.multiplex}
v7(multiplex)
v1 --> v6
v5 --> v6
v6 -->|true| v7
end
subgraph " "
v8(create_blast_db)
v9(blastreduce_transcode_fasta)
v7 --> v8
v6 -->|false| v8
v8 --> v9
end
subgraph " "
v10(split_fasta)
v11(all_by_all_blast)
v12(blastreduce)
v9 --> v10
v10 --> v11
v11 --> v12
end
subgraph " "
v13{if params.multiplex}
v14(demultiplex)
v12 --> v13
v13 -->|true| v14
end
v15(compute_stats)
v16(visualize)
v14 --> v15
v13 -->|false| v15
v15 --> v16
subgraph " "
v17[/Graphs:
pid vs aln score
aln len vs aln score
.../]
v18[/1.out.parquet/]
v19[/boxplot_stats.parquet
evalue.tab
acc_counts.json/]
end
v16 --> v17
v14 --> v18
v12 --> v18
v15 --> v19
I'm fairly sure I haven't listed all of the output files from the steps. I'll get back to finishing this and making the other pipelines once I've got a handle on my KBase tasks.
Nextflow has the ability to create a visual representation of a pipeline. This might help someone understand how the tool operates. Generate a DAG diagram for each pipeline and include it with the Sphinx documentation. There are a variety of output formats to choose from; try the
mmd
output option with the sphinx plugin for mermaid diagrams and if it does not work well, see if the HTML version can be included in the docs. If neither of those options work well, render an SVG or PNG version of the DAG. Pick an appropriate location to store these files.Within the docs, these images should fit well on the index page for the their respective pipelines but feel free to explore other options.
An example command for generating the DAG would be:
Current pipelines to render images for:
Finally, consider adding to the
docs-html
Makefile target a command which renders these pipeline diagrams every time the documentation is built. This will ensure that they are always up-to-date.