Calculate average Jaccard Index between all gene cluster that are in the network for domains, creating a second similarity matrix. Then use DBScan to separate the gene cluster into groups;
Make the network output as an interactive chart (just like Numbers does), named calibration graph, allowing to see the networks to change throughout a range of cutoffs, highlighting family of "gold standards BGCs" (just like an "internal standard") and using second DBScan groups to color nodes; PS: only include edges for biosynthetic or hypothetical (uncolored)
After selecting the cutoffs
Add (a better) filtering script, where the user will point the best cutoff he could find using this calibration graph;
Automatically generate output images (one with and other w/o regulatory/mobile/resistance genes) for the selected network (using NetworkX?), but also provide cytoscape output;
Add multiple gene alignment images upon clicking family in the outpout;
For future
Run analysis on multiple samples (multiCOMPASS module?). Suggestion: run analysis for the genome with most BGCs, then loop until all BGCs from query are in the final network.
Remaining challanges
How to improve subclustering rules?
How to better select best DBScan subclustering itineration?
How to select best EPS for Jaccard index DBScan?
How to run multiple samples with different subclustering?
Before selecting the cutoffs
Modifications for the current code:
New additions to the code:
After selecting the cutoffs
For future
Remaining challanges