SDM-TIB / SDM-RDFizer

An Efficient RML-Compliant Engine for Knowledge Graph Construction
https://doi.org/10.5281/zenodo.3872103
Apache License 2.0
107 stars 25 forks source link

expected string or bytes-like object #53

Closed arenas-guerrero-julian closed 3 years ago

arenas-guerrero-julian commented 3 years ago

Hi!, I get the following error when materializing GTFS-Madrid-benchmark with MySQL:

Semantifying output...
Traceback (most recent call last):
  File "run_rdfizer.py", line 3, in <module>
    semantify(str(sys.argv[1]))
  File "/home/julian/GitHub/SDM-RDFizer/rdfizer/rdfizer/semantify.py", line 4277, in semantify
    number_triple += executor.submit(semantify_mysql, row, row_headers, triples_map, triples_map_list, output_file_descriptor, wr, config[dataset_i]["name"], config[dataset_i]["host"], int(config[dataset_i]["port"]), config[dataset_i]["user"], config[dataset_i]["password"],config[dataset_i]["db"]).result()
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 439, in result
    return self.__get_result()
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
  File "/usr/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/julian/GitHub/SDM-RDFizer/rdfizer/rdfizer/semantify.py", line 3106, in semantify_mysql
    hash_maker_array(cursor, triples_map_element, predicate_object_map.object_map)
  File "/home/julian/GitHub/SDM-RDFizer/rdfizer/rdfizer/semantify.py", line 279, in hash_maker_array
    hash_table.update({element : {"<" + string_substitution_array(parent_subject.subject_map.value, "{(.+?)}", row, row_headers, "object",ignore) + ">" : "object"}}) 
  File "/home/julian/GitHub/SDM-RDFizer/rdfizer/rdfizer/functions.py", line 380, in string_substitution_array
    if re.search("^[\s|\t]*$", value) is None:
  File "/usr/lib/python3.8/re.py", line 201, in search
    return _compile(pattern, flags).search(string)
TypeError: expected string or bytes-like object

I opened a very similar issue (#38) some months ago. Solution is the same (I have tested it). This problem is likely repeated in other parts of the source code.

Julián

eiglesias34 commented 3 years ago

Hi,

What is the data type of the value that caused the problem? I added if a variable is different than str it will be converted into a string. Before, I was taking into consideration only int and floats for this conversion. Please test it out and tell me if it works.

arenas-guerrero-julian commented 3 years ago

Cannot tell, but my naive solution has been:

if re.search("^[\s|\t]*$", str(value)) is None:
    value = urllib.parse.quote(str(value))
    new_string = new_string[:start + offset_current_substitution] + value.strip() + new_string[ end + offset_current_substitution:]
    offset_current_substitution = offset_current_substitution + len(value) - (end - start)
eiglesias34 commented 3 years ago

Ok, What I have added is that the SDM-RDFizer will apply the string conversion only when the value in question is not a string. Before I only took into consideration ints and floats. This is to avoid applying the conversion to a value that is already a string.