Add grape methods for classifiers

caufieldjh commented 1 year ago

The parts appear to work on their own (i.e., can specify an Ensmallen edge prediction model and can apply it to a graph) but running a full config with a perceptron (grape.edge_prediction.PerceptronEdgePrediction) yields the following:

$ neat run --config giraffe.yaml 
  0%|                                                                                                                                       | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/home/harry/neat-env/bin/neat", line 8, in <module>
    sys.exit(cli())
  File "/home/harry/neat-env/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/home/harry/neat-env/lib/python3.8/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/home/harry/neat-env/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/harry/neat-env/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/harry/neat-env/lib/python3.8/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/harry/neat-env/lib/python3.8/site-packages/neat_ml/cli.py", line 78, in run
    raise NotImplementedError(f"{model} isn't implemented yet")
NotImplementedError: None isn't implemented yet

My guess is that the handoff from embedder to classifier isn't passing the model object, it's just trying to get the name of it first.

caufieldjh commented 1 year ago

That was just an issue of the CLI not recognizing the model name.

Now we're here:

Traceback (most recent call last):
  File "/home/harry/neat-env/bin/neat", line 8, in <module>
    sys.exit(cli())
  File "/home/harry/neat-env/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/home/harry/neat-env/lib/python3.8/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/home/harry/neat-env/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/harry/neat-env/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/harry/neat-env/lib/python3.8/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/harry/neat-env/lib/python3.8/site-packages/neat_ml/cli.py", line 107, in run
    np.around(model.predict(validation_data[0]), decimals=0)
  File "/home/harry/neat-env/lib/python3.8/site-packages/neat_ml/link_prediction/model.py", line 46, in predict
    return self.model.predict(predict_data)  # type: ignore
  File "/home/harry/neat-env/lib/python3.8/site-packages/embiggen/edge_prediction/edge_prediction_model.py", line 323, in predict
    predictions = super().predict(
  File "/home/harry/neat-env/lib/python3.8/site-packages/embiggen/utils/abstract_models/abstract_classifier_model.py", line 869, in predict
    if not graph.has_nodes():
AttributeError: 'numpy.ndarray' object has no attribute 'has_nodes'

caufieldjh commented 1 year ago

Classifier building and validation works now with graph + embedding + validation, but fails when trying to apply the classifier as it can't load it from a file (since we can't save the model).

caufieldjh commented 1 year ago

Just need to:

[x] Ensure grape models don't output existing edges in predictions, if asked
[x] Ensure node type filter is used for grape models
[ ] ~~Remove limitation from gen_src_dst_pair and refactor to not immediately consume all memory~~ Punting on this for now, see below
[x] Finish cleanup/linting for flake8

caufieldjh commented 1 year ago

Re: the gen_src_dist_pair function, the current behavior is essentially to generate all potential edges (except, in practice, we don't, because that would require more memory than we're likely to have), add them to a list (that's the problem step, really), and pass that back to the predict_links function to check if they match the node type filter. We use this list to generate the corresponding edge embeddings with edge_embedding_for_predict.

For the grape models it may make more sense to build the model, run predict_proba, then remove all existing edges from the results without trying to generate every potential combination first (the link prediction subsampling already happens in grape, AFAIK).

So that refactor can go in a different PR.

hrshdhgd commented 1 year ago

Apologies in advance for the annoying comments about constants, I bring this up because we do this in sssom and it is really convenient and is easy if refactors are needed.

caufieldjh commented 1 year ago

Apologies in advance for the annoying comments about constants, I bring this up because we do this in sssom and it is really convenient and is easy if refactors are needed.

No worries, it's probably time for those to be constants. Since these are all defined in the neat schema, we could use the schema to define the set of constants

caufieldjh commented 1 year ago

Tests failing due to some new issue between tox and flake8

hrshdhgd commented 1 year ago

For now I've implemented an unrecommended fix of commenting out the troublemaker flake8 plugins:

flake8-bandit
flake8-isort

flake8 had a new release go out today and that may have some influence over this issue.

hrshdhgd commented 1 year ago

Another solution: I brought back the plugins and limited flake8 < 5.0.0. This works too.

caufieldjh commented 1 year ago

Another solution: I brought back the plugins and limited flake8 < 5.0.0. This works too.

Excellent - we may just have to keep it like that for a bit

Knowledge-Graph-Hub / neat-ml

Add grape methods for classifiers #93