Closed caufieldjh closed 1 year ago
All the tests currently pass so it's not as if the configs are non-functional, they just don't adhere to the schema in at least one way.
Strangely enough, running this on its own completes as expected:
>>> from neat_ml.yaml_helper.yaml_helper import validate_config
>>> from neat_ml.yaml_helper.yaml_helper import parse_yaml
>>> iny = parse_yaml("neat_quickstart.yaml")
>>> iny
{'Target': {'target_path': 'quickstart_output'}, 'GraphDataConfiguration': {'graph': {'directed': False, 'node_path': 'tests/resources/test_graphs/test_small_nodes.tsv', 'edge_path': 'tests/resources/test_graphs/test_small_edges.tsv', 'verbose': True, 'nodes_column': 'id', 'node_list_node_types_column': 'category', 'default_node_type': 'biolink:NamedThing', 'sources_column': 'subject', 'destinations_column': 'object', 'default_edge_type': 'biolink:related_to'}}, 'EmbeddingsConfig': {'filename': 'quickstart_embedding.csv', 'history_filename': 'quickstart_embedding_history.json', 'node_embeddings_params': {'method_name': 'CBOW', 'walk_length': 10, 'batch_size': 128, 'window_size': 4, 'return_weight': 1.0, 'explore_weight': 1.0, 'iterations': 5}, 'tsne_filename': 'tsne_quickstart.png'}, 'ClassifierContainer': {'classifiers': [{'classifier_id': 'lr_1', 'classifier_name': 'Logistic Regression', 'classifier_type': 'sklearn.linear_model.LogisticRegression', 'edge_method': 'Average', 'outfile': 'model_lr_quickstart', 'parameters': {'sklearn_params': {'random_state': 42, 'max_iter': 100}}}]}, 'ApplyTrainedModelsContainer': {'models': [{'model_id': 'lr_1', 'node_types': {'source': ['biolink:Protein'], 'destination': ['biolink:Protein']}, 'cutoff': 0.9, 'outfile': 'lr_protein_predictions.tsv'}]}}
>>> validate_config(iny)
True
So this may just be due to how the method is being called.
May want to include line number in validation errors as well.
When running a config, this is what happens:
So this configuration clearly doesn't pass the validation step, but why not? We need a more informative output here.