NBChub / bgcflow

Snakemake workflow for the analysis of biosynthetic gene clusters across large collections of genomes (pangenomes)
https://github.com/NBChub/bgcflow/wiki
MIT License
29 stars 7 forks source link

BGCflow v1 scope #43

Closed matinnuhamunada closed 2 years ago

matinnuhamunada commented 2 years ago

@OmkarSaMo I think we should define the scope of the workflow (at least for the first version/release). This will help us to focus on delivering a stable release.

I think this tagline you wrote is cool: "Snakemake workflow to systematically analyze BGCs and pangenomes of large number genomes" Which then let us define the rules required for the first stable release.

As more tools are being added, it is hard to keep track of things that are happening in the workflow. Categorizing the rules and giving a simple definition of what the rules should do will help us focus and find out what tools are still missing or prioritized.

Public database utilities:

Pre-processing & QC:

Phylogenomic analysis:

Gene annotation and Enrichment:

BGC analysis:

Pangenome analysis:

Summary & reporting:

I made a tick to the minimal functions that is required to run the workflow. What do you think?

PS: this could be nice for the docs / wiki

OmkarSaMo commented 2 years ago

How about following?

Scope:

"Large-scale automated and adaptive workflows to systematically analyze BGCs and pangenomes of various bacterial groups"

Vision:

"Our vision is to computationally characterize and access any cluster, from every genome"

OmkarSaMo commented 2 years ago

I have edited some of the rule categories and descriptions. Will check out again later