RuntimeError when validation fails isn't very informative

caufieldjh commented 1 year ago

When running a config, this is what happens:

$ neat run --config neat_quickstart.yaml
Traceback (most recent call last):
  File "/home/harry/kg-env/bin/neat", line 8, in <module>
    sys.exit(cli())
  File "/home/harry/kg-env/lib/python3.8/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/home/harry/kg-env/lib/python3.8/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/home/harry/kg-env/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/harry/kg-env/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/harry/kg-env/lib/python3.8/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/home/harry/kg-env/lib/python3.8/site-packages/neat_ml/cli.py", line 45, in run
    yhelp = YamlHelper(config)
  File "/home/harry/kg-env/lib/python3.8/site-packages/neat_ml/yaml_helper/yaml_helper.py", line 147, in __init__
    if not validate_config(self.yaml):
  File "/home/harry/kg-env/lib/python3.8/site-packages/neat_ml/yaml_helper/yaml_helper.py", line 43, in validate_config
    raise RuntimeError
RuntimeError

So this configuration clearly doesn't pass the validation step, but why not? We need a more informative output here.

caufieldjh commented 1 year ago

All the tests currently pass so it's not as if the configs are non-functional, they just don't adhere to the schema in at least one way.

caufieldjh commented 1 year ago

Strangely enough, running this on its own completes as expected:

>>> from neat_ml.yaml_helper.yaml_helper import validate_config
>>> from neat_ml.yaml_helper.yaml_helper import parse_yaml
>>> iny = parse_yaml("neat_quickstart.yaml")
>>> iny
{'Target': {'target_path': 'quickstart_output'}, 'GraphDataConfiguration': {'graph': {'directed': False, 'node_path': 'tests/resources/test_graphs/test_small_nodes.tsv', 'edge_path': 'tests/resources/test_graphs/test_small_edges.tsv', 'verbose': True, 'nodes_column': 'id', 'node_list_node_types_column': 'category', 'default_node_type': 'biolink:NamedThing', 'sources_column': 'subject', 'destinations_column': 'object', 'default_edge_type': 'biolink:related_to'}}, 'EmbeddingsConfig': {'filename': 'quickstart_embedding.csv', 'history_filename': 'quickstart_embedding_history.json', 'node_embeddings_params': {'method_name': 'CBOW', 'walk_length': 10, 'batch_size': 128, 'window_size': 4, 'return_weight': 1.0, 'explore_weight': 1.0, 'iterations': 5}, 'tsne_filename': 'tsne_quickstart.png'}, 'ClassifierContainer': {'classifiers': [{'classifier_id': 'lr_1', 'classifier_name': 'Logistic Regression', 'classifier_type': 'sklearn.linear_model.LogisticRegression', 'edge_method': 'Average', 'outfile': 'model_lr_quickstart', 'parameters': {'sklearn_params': {'random_state': 42, 'max_iter': 100}}}]}, 'ApplyTrainedModelsContainer': {'models': [{'model_id': 'lr_1', 'node_types': {'source': ['biolink:Protein'], 'destination': ['biolink:Protein']}, 'cutoff': 0.9, 'outfile': 'lr_protein_predictions.tsv'}]}}
>>> validate_config(iny)
True

So this may just be due to how the method is being called.

caufieldjh commented 1 year ago

May want to include line number in validation errors as well.

Knowledge-Graph-Hub / neat-ml

RuntimeError when validation fails isn't very informative #94