SDM-TIB / SDM-RDFizer

An Efficient RML-Compliant Engine for Knowledge Graph Construction
https://doi.org/10.5281/zenodo.3872103
Apache License 2.0
107 stars 25 forks source link

Connection to Postgresql db #71

Closed pabloalarconm closed 2 years ago

pabloalarconm commented 2 years ago

I have problems to connect (both using Dockerfile image and without it) to my Postgress db in my localhost. It seems configuration .ini file its not pointing correctly, but AFAIK all parameters looks correct. Resulting .nt file and output_datasets_stats.csv are created but empty. I paste my config.ini here:


[default]
main_directory: ./data

[datasets]
number_of_datasets: 1
output_folder: ${default:main_directory}/graph
all_in_one_file: no
remove_duplicate: no
enrichment: yes
dbType: postgres
name: output
ordered: yes
large_file: false

[dataset1]
user: postgres
password: postgres 
host: localhost
db: postgres
name: resulting
mapping: ${default:main_directory}/template.ttl

This is the raising error generated:

INFO:rdflib:RDFLib Version: 4.2.2
Semantifying resulting...
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/pablo/.local/lib/python3.8/site-packages/rdfizer/__main__.py", line 44, in <module>
    semantify(config_path)
  File "/home/pablo/.local/lib/python3.8/site-packages/rdfizer/__init__.py", line 4063, in semantify
    reader = pd.read_csv(source, dtype = str)
  File "/usr/local/lib/python3.8/dist-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers/readers.py", line 586, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers/readers.py", line 482, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers/readers.py", line 811, in __init__
    self._engine = self._make_engine(self.engine)
  File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers/readers.py", line 1040, in _make_engine
    return mapping[engine](self.f, **self.options)  # type: ignore[call-arg]
  File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers/c_parser_wrapper.py", line 51, in __init__
    self._open_handles(src, kwds)
  File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers/base_parser.py", line 222, in _open_handles
    self.handles = get_handle(
  File "/usr/local/lib/python3.8/dist-packages/pandas/io/common.py", line 609, in get_handle
    ioargs = _get_filepath_or_buffer(
  File "/usr/local/lib/python3.8/dist-packages/pandas/io/common.py", line 312, in _get_filepath_or_buffer
    with urlopen(req_info) as req:
  File "/usr/local/lib/python3.8/dist-packages/pandas/io/common.py", line 212, in urlopen
    return urllib.request.urlopen(*args, **kwargs)
  File "/usr/lib/python3.8/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.8/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/usr/lib/python3.8/urllib/request.py", line 640, in http_response
    response = self.parent.error(
  File "/usr/lib/python3.8/urllib/request.py", line 569, in error
    return self._call_chain(*args)
  File "/usr/lib/python3.8/urllib/request.py", line 502, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.8/urllib/request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found

Thanks :)

eiglesias34 commented 2 years ago

Hello @pabloalarconm,

First of all, thank you for using the SDM-RDFizer. I executed this mapping to see if I could get the same error as you.

@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix ex: <http://example.com/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rml: <http://semweb.mmlab.be/ns/rml#> .
@prefix ql: <http://semweb.mmlab.be/ns/ql#> .
@base <http://example.com/base/> .
@prefix d2rq: <http://www.wiwiss.fu-berlin.de/suhl/bizer/D2RQ/0.1#> .

<TriplesMap1> a rr:TriplesMap;

  rml:logicalSource [
    rml:source <#DB_source>;
    rr:sqlVersion rr:SQL2008;
    rml:query "SELECT CONCAT('Student', ID) AS StudentId, ID, Name FROM student";
    rml:referenceFormulation ql:CSV
  ];

  rr:subjectMap [
      rml:reference "StudentId"; rr:termType rr:BlankNode
    ];

    rr:predicateObjectMap [
      rr:predicate foaf:name ;
      rr:objectMap [ rml:reference "Name" ]
    ].

<#DB_source> a d2rq:Database;
  d2rq:jdbcDSN "CONNECTIONDSN"; # the "jbdc:mysql:// part is ignored
  d2rq:jdbcDriver "com.mysql.cj.jdbc.Driver"; # this is ignored
  d2rq:username "root";
  d2rq:password "" .

I got the same error. The problem here is not the SDM-RDFizer but the mapping. Having the clause rml:referenceFormulation ql:CSV makes the SDM-RDFizer think that the data source for this triples map is a CSV file, not a relational database. Please make sure that there is no rml:referenceFormulation to a file format in your mapping.

Thank you again for using the SDM-RDFizer, Sincerely Enrique

pabloalarconm commented 2 years ago

Solved, thank you so much!

Bests, Pablo