Add code to apply classifier for link prediction

justaddcoffee commented 2 years ago

Add block of YAML to test.yaml to demonstrate how to apply a classifier to do link prediction, as explained in #48

Hacking with @caufieldjh

caufieldjh commented 2 years ago

The issue referenced in run_classifier is this: https://github.com/AnacletoLAB/ensmallen/issues/139

caufieldjh commented 2 years ago

When loading graph, the following parameters need to be included:

  "node_type_path": target_node_type_list_path,
  "node_types_column_number": 0,
  "node_type_list_is_correct": True,
  "node_type_list_separator": "\t"

where target_node_type_list_path is the path to a file containing one unique node type per line, e.g., this would work for KG-Microbe:

biolink:AbstractEntity
biolink:ActivityAndBehavior
biolink:AnatomicalEntity
biolink:BiologicalProcess
biolink:CellularComponent
biolink:ChemicalSubstance
biolink:MolecularActivity
biolink:NamedThing
biolink:OntologyClass
biolink:OrganismTaxon

caufieldjh commented 2 years ago

I think the issue with the wrong model being loaded in the run_classifier test is that in sklearn_model.py the save() function pickles self.model, and self.model is the class imported from sklearn - but not an object of the type Model defined in our model.py. So the loaded model isn't the right type for any of our new functions, just the standard sklearn ones.

caufieldjh commented 2 years ago

At this point, I was able to run neat run --config tests/resources/test.yaml and three of the four models appeared to complete training without issue, but for the fourth (LogisticRegression):

/home/harry/neat-env/lib/python3.8/site-packages/sklearn/linear_model/_logistic.py:814: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [2:54:17<00:100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [2:54:17<00:00, 2614.41s/it]
Traceback (most recent call last):
  File "/home/harry/neat-env/bin/neat", line 8, in <module>
    sys.exit(cli())
  File "/home/harry/neat-env/lib/python3.8/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/home/harry/neat-env/lib/python3.8/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/home/harry/neat-env/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/harry/neat-env/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/harry/neat-env/lib/python3.8/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/home/harry/neat-env/lib/python3.8/site-packages/neat/cli.py", line 94, in run
    predict_links(**classifier_kwargs)
TypeError: predict_links() got an unexpected keyword argument 'embeddings'

caufieldjh commented 2 years ago

One additional required bit of functionality: need to be able to filter based on node namespace in addition to Biolink category. (e.g., only consider predicted links involving HPO:)

caufieldjh commented 2 years ago

TBD:

[x] Resolve differences between predict in TF vs. Sklearn - just want to return a float representing result for True, don't report other values
[x] Fix that last mypy error
[x] In link prediction, include node_types filter
[ ] ~~In link prediction, filter nodes by prefix~~ Move to new issue
[ ] ~~In link prediction, filter nodes by other slots~~ Move to new issue
[x] In link prediction, include cutoff
[ ] ~~General cleanup + code smells~~ - for new PR

sonarcloud[bot] commented 2 years ago

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
12 Code Smells

84.3% Coverage
0.0% Duplication

Knowledge-Graph-Hub / neat-ml

Add code to apply classifier for link prediction #48