Global-Chem / global-chem

A Knowledge Graph of Common Chemical Names to their Molecular Definition
https://globalchemistry.org/
Mozilla Public License 2.0
156 stars 21 forks source link

SG-1: Dashboard for Principal Component Analysis #228

Closed Sulstice closed 4 months ago

Sulstice commented 1 year ago

In this issue, we will use it to create a dashboard of the Principal Component Analysis App since that is our most commonly used feature.

Theory:

Principal Component Analysis can be used to discover hyper-parameters necessary for machine learning to make distinguished features that are intuitive to us as scientists. Our tool aims to help scientists to help discover the best hyperparameters for their chemical data set.

https://sulstice.gitbook.io/globalchem-your-chemical-graph-network/cheminformatics/principal-component-analysis-smiles

Software Demo

To install it:


!pip install -q global-chem[cheminformatics] --upgrade

To run it:


from global_chem import GlobalChem
from global_chem_extensions import GlobalChemExtensions

gc = GlobalChem()
cheminformatics = GlobalChemExtensions().cheminformatics()

gc = GlobalChem()
gc.build_global_chem_network(print_output=False, debugger=False)
smiles_list = list(gc.get_node_smiles('emerging_perfluroalkyls').values())

mol_ids = cheminformatics.node_pca_analysis(
            smiles_list,
            morgan_radius = 1,
            bit_representation = 512,
            number_of_clusters = 3,
            number_of_components = 0.95,
            random_state = 0,
            principal_component_x = 0 ,
            principal_component_y = 1 ,
            x_axis_label = 'PC1',
            y_axis_label = 'PC2',
            plot_width = 500,
            plot_height = 500,
            title = '',
            save_file=False,
            return_mol_ids=True,
            save_principal_components=True,
)
Screenshot 2023-02-10 at 11 19 30 AM

Any problems let me know. Try running it solo on your local machine and you can do it in a jupyter notebook or google colab.

The problem

We would like the user to be able to input a list of SMILES:


smiles_list = ['CCC', 'CC', 'CCCCC' ] 

>>> input goes into the function

mol_ids = cheminformatics.node_pca_analysis(
            smiles_list,
            morgan_radius = 1,
            bit_representation = 512,
            number_of_clusters = 3,
            number_of_components = 0.95,
            random_state = 0,
            principal_component_x = 0 ,
            principal_component_y = 1 ,
            x_axis_label = 'PC1',
            y_axis_label = 'PC2',
            plot_width = 500,
            plot_height = 500,
            title = '',
            save_file=False,
            return_mol_ids=True,
            save_principal_components=True,
)

<<< Interactive Plot Comes out

And that's it. That will get us our first milestone.

To begin developing the dashboard infrastructure I will bounce ideas off you. The first idea I had for infrastructure was this:

https://towardsdatascience.com/creating-a-better-dashboard-with-python-dash-and-plotly-80dfb4269882

I think to being working on the code I suggest perhaps clone the repository and then make your own directory called dashboard it can be a standalone directory at the top of the repo because it is an important feature.

Let me know any thoughts or initial questions. I'm around.

Sulstice commented 1 year ago

@LadyBluenotes This will do the job in terms of the index.html file. Download it. You click "view raw" and then save the file as index.html.

https://github.com/Sulstice/CannabisSativa

I'm wondering if we ever need to edit that file or can we just have a box inside the dashboard and say put all this html in that box.

Sulstice commented 1 year ago

Input

smiles_list = ['C1=CC=CC=C1', 'CCCC', 'CCCCC']

Sulstice commented 1 year ago

For the next steps:

1.) Trigger the GIthub Action Public Workflow via a button from the front-end website.

2.) Can you pull a file from the github actions rest api from the server that spawned the job or do we have to figure out how to pass a string of HTML back to the front-end via an endpoint of some kind .

Documentation

https://docs.github.com/en/rest?apiVersion=2022-11-28

Another task is to move the front-end into the GlobalChem Organization.