Closed justaddcoffee closed 2 years ago
The issue referenced in run_classifier
is this: https://github.com/AnacletoLAB/ensmallen/issues/139
When loading graph, the following parameters need to be included:
"node_type_path": target_node_type_list_path,
"node_types_column_number": 0,
"node_type_list_is_correct": True,
"node_type_list_separator": "\t"
where target_node_type_list_path
is the path to a file containing one unique node type per line, e.g., this would work for KG-Microbe:
biolink:AbstractEntity
biolink:ActivityAndBehavior
biolink:AnatomicalEntity
biolink:BiologicalProcess
biolink:CellularComponent
biolink:ChemicalSubstance
biolink:MolecularActivity
biolink:NamedThing
biolink:OntologyClass
biolink:OrganismTaxon
I think the issue with the wrong model being loaded in the run_classifier test is that in sklearn_model.py
the save()
function pickles self.model
, and self.model
is the class imported from sklearn - but not an object of the type Model defined in our model.py
. So the loaded model isn't the right type for any of our new functions, just the standard sklearn ones.
At this point, I was able to run neat run --config tests/resources/test.yaml
and three of the four models appeared to complete training without issue, but for the fourth (LogisticRegression):
/home/harry/neat-env/lib/python3.8/site-packages/sklearn/linear_model/_logistic.py:814: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
n_iter_i = _check_optimize_result(
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [2:54:17<00:100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [2:54:17<00:00, 2614.41s/it]
Traceback (most recent call last):
File "/home/harry/neat-env/bin/neat", line 8, in <module>
sys.exit(cli())
File "/home/harry/neat-env/lib/python3.8/site-packages/click/core.py", line 1128, in __call__
return self.main(*args, **kwargs)
File "/home/harry/neat-env/lib/python3.8/site-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/home/harry/neat-env/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/harry/neat-env/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/harry/neat-env/lib/python3.8/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/home/harry/neat-env/lib/python3.8/site-packages/neat/cli.py", line 94, in run
predict_links(**classifier_kwargs)
TypeError: predict_links() got an unexpected keyword argument 'embeddings'
One additional required bit of functionality: need to be able to filter based on node namespace in addition to Biolink category. (e.g., only consider predicted links involving HPO:)
TBD:
predict
in TF vs. Sklearn - just want to return a float representing result for True, don't report other valuesKudos, SonarCloud Quality Gate passed!
Add block of YAML to test.yaml to demonstrate how to apply a classifier to do link prediction, as explained in #48
Hacking with @caufieldjh