Would be really nice to have a visual summary of various steps in the pipeline using Sankey diagrams, e.g. via plotly. Some diagram ideas:
Flowchart of common names between different files associated with a given dataset (e.g. protein ids, gene names, uniprot ids, etc.) to see how much information is lost during file download and initial read mapping per species
Sankey diagram of mapping from different species to orthogroups (flow volume proportional to number of orthogroups). Show e.g. orthogroups with 3 species, each combination of 2 species, and unique to 1 species.
I've started building one of these functions - visualizing gene vs. orthogroup Leiden clusters. Need to convert to a function though; currently implemented in 3_ prefix notebooks just as repeated code.
Would be really nice to have a visual summary of various steps in the pipeline using Sankey diagrams, e.g. via plotly. Some diagram ideas: