NBChub / bgcflow

Snakemake workflow for the analysis of biosynthetic gene clusters across large collections of genomes (pangenomes)
https://github.com/NBChub/bgcflow/wiki
MIT License
29 stars 7 forks source link

Feature: df_bgcs tables with all metadata #197

Open OmkarSaMo opened 2 years ago

OmkarSaMo commented 2 years ago

Proposal:

Create a df_bgcs and df_gcfs tables in the processed/{project_name}/tables directory with several metadata directly from antiSMASH results.

I think a general table will be valuable with several metadata of bgcs in the main tables directory instead of the current for_cytoscape directory.

List of extra metadata:

I think some extra columns will be beneficial, adding a few below and look for more recommendations:

Need anything more - @matinnuhamunada ?

matinnuhamunada commented 1 year ago

This looks perfect! Some of the data on this table can answer questions that @EVBAST and @tilmweber discussed this morning. Adding URLs to the MIBIG hits proven to be useful for end users too.

EVBAST commented 1 year ago

This looks perfect! Some of the data on this table can answer questions that @EVBAST and @tilmweber discussed this morning. Adding URLs to the MIBIG hits proven to be useful for end users too.

THANKS @matinnuhamunada and @OmkarSaMo

matinnuhamunada commented 1 year ago

Hi, this issue will be adressed in the 0.6.1 release. As the table is huge, it will be stored in .parquet format and will be loaded to duckdb (instead of sqlite) https://github.com/NBChub/bgcflow/tree/dev-0.5.1