DaniFdezAlvarez / shexer

Apache License 2.0
26 stars 2 forks source link

Enable all classes mode for url_endpoint #117

Closed DaniFdezAlvarez closed 1 year ago

DaniFdezAlvarez commented 2 years ago

With this example code:

rom shexer.shaper import Shaper
from shexer.consts import SHEXC, SHACL_TURTLE

namespaces_dict = {"http://www.w3.org/1999/02/22-rdf-syntax-ns#": "rdf",
                   "http://example.org/": "ex",
                   "http://weso.es/shapes/": "",
                   "http://www.w3.org/2001/XMLSchema#": "xsd"
                   }

shaper = Shaper(all_classes_mode=True,
                url_endpoint="http://localhost:3030/myRDF/sparql",
                namespaces_dict=namespaces_dict,
                disable_exact_cardinality=True,
                instantiation_property="http://www.w3.org/1999/02/22-rdf-syntax-ns#type")  # Default rdf:type

output_file = "shaper_example.shex"

shaper.shex_graph(output_file=output_file,
                  output_format=SHACL_TURTLE,
                  verbose=True,
                  acceptance_threshold=0.1)

print("Done!")

We get this traceback:

Traceback (most recent call last):
  File "/Users/XXX/test.py", line 18, in <module>
    shaper.shex_graph(output_file=output_file,
  File "/Users/XXX/.pyenv/versions/3.10.1/lib/python3.10/site-packages/shexer/shaper.py", line 196, in shex_graph
    self._launch_instance_tracker(verbose=verbose)
  File "/Users/XXX/.pyenv/versions/3.10.1/lib/python3.10/site-packages/shexer/shaper.py", line 227, in _launch_instance_tracker
    self._instance_tracker = self._build_instance_tracker()
  File "/Users/XXX/.pyenv/versions/3.10.1/lib/python3.10/site-packages/shexer/shaper.py", line 289, in _build_instance_tracker
    return get_instance_tracker(instances_file_input=self._instances_file_input,
  File "/Users/XXX/.pyenv/versions/3.10.1/lib/python3.10/site-packages/shexer/utils/factories/instance_tracker_factory.py", line 98, in get_instance_tracker
    instance_yielder = get_triple_yielder(source_file=graph_file_input,
  File "/Users/XXX/.pyenv/versions/3.10.1/lib/python3.10/site-packages/shexer/utils/factories/triple_yielders_factory.py", line 59, in get_triple_yielder
    shape_map = produce_shape_map_according_to_input(sm_format=shape_map_format,
  File "/Users/XXX/.pyenv/versions/3.10.1/lib/python3.10/site-packages/shexer/utils/factories/triple_yielders_factory.py", line 40, in produce_shape_map_according_to_input
    else read_target_classes_from_file(file_target_classes=file_target_classes,
  File "/Users/XXX/.pyenv/versions/3.10.1/lib/python3.10/site-packages/shexer/utils/factories/triple_yielders_factory.py", line 141, in read_target_classes_from_file
    with open(file_target_classes, "r") as in_stream:
TypeError: expected str, bytes or os.PathLike object, not NoneType

The code in shexer.util.factories.triple_yielders_factory assumes that some kind of target classes (list, file) has been provided, so this not informative error is raised. Support to all classes mode should be added at that point. This mode should cause a SPARQL query retrievingon those classes with at least an instance. Those elements should be used as target classes .