Closed Mec-iS closed 2 years ago
Hi @Mec-iS ,
Keep in mind that materialize
can also receive a config in the form of a string and not a path. E.g.:
config = """
[DataSource1]
mappings=/path/to/mapping/mapping_file.rml.ttl
db_url=mysql+pymysql://user:password@localhost:3306/db_name
"""
graph = morph_kgc.materialize(config)
see doc
yes thanks. I am just doing it step by step to test the integration.
It looks like the file we currently use as test recipe.ttl
has a faulty header according to morph-kgc
~/drwn/.venv/lib/python3.8/site-packages/morph_kgc/args_parser.py in load_config_from_argument(config_entry)
86 config = Config(interpolation=ExtendedInterpolation())
87 if os.path.isfile(config_entry):
---> 88 config.read(config_entry)
89 else:
90 # it is a string
MissingSectionHeaderError: File contains no section headers.
file: '/home/lorenzo/drwn/kglab/dat/recipes.ttl', line: 1
'@prefix dct: <http://purl.org/dc/terms/> .\n'
Hi @Mec-iS ,
materialize
expects a .ini
file, not a .ttl
. For testing morph-kgc with recipe you would need to define an RML mapping from recipes.csv to recipes.ttl. And prepare the config.ini
.
Other option is to use one of the examples in the morph-kgc repo e.g. the json-example
Thanks for the feedback. I am trying to establish a baseline.
I have added a default ini
as taken from here.
What is the expected way to
define an RML mapping from recipes.csv to recipes.ttl
?
In the kglab
repository we already have both formats, do you mean something like this?
I think I am missing the point on how to generate the mapping file, .rml.ttl
or .rml.csv
.
MissingSectionHeaderError: File contains no section headers. file: '/home/lorenzo/drwn/kglab/dat/recipes.ttl', line: 1 '@prefix dct: http://purl.org/dc/terms/ .\n'
Does that File contains no section headers
message refers to the input .ini
file, and not the recipes.ttl
file?
There's no particular definition of "section headers" in RDF files.
We've used recipes.ttl
with a number of different platforms and validators, with no errors.
It does go into @prefix
definitions without specifying the optional @base
– if that may have triggered a warning?
In the
kglab
repository we already have both formats, do you mean something like this?
For this integration, the biggest use cases will be to formalize how sources from SQL, CSV, etc., can be ingested into an RDF graph. We're not looking at means to translate between serialization formats, e.g., go between CSV and TTL.
While there are means of importing CSV already in kglab
, through the csvwlib
integration, this integration with morph-kgc
would provide superior means, and also ways to parallelize and make these kinds of inputs much more efficient.
In the most immediate use cases (for our colleagues in Madrid and Murcia) they have many smaller SQL databases and thousands of CSV files, so there are performance issues at scale for ingest, and Morph can really help! :)
@ceteri
i don't think this is finished because the mapping RML file is missing (what in the morph-kgl documentation has extension .rml.ttl
). The ttl file is not enough to make it work, there should be a way to generate the RML mapping from a ttl or CSV.
@Mec-iS Thank you, I've reverted this merge.
Was just trying examples within my own branch and ran into problems with the csv-examples
in Morph, when running locally.
Instead of applying an RML mapping to a TTL file as input, how about if we show an example in the tutorial notebook that takes 2 simple CSV files? This is a general pattern among users, where they already have node+edge files as CSV.
@ArenasGuerreroJulian do you have an simple RML for CSV examples? Something like the proverbial minimum viable that would show nodes and edges for a simple graph? That would help for our community, where they aren't familiar with RML yet. For example, how about something like this? https://rml.io/yarrrml/tutorial/getting-started/#example
@Mec-iS here's a branch with preparations for a new release, for the morph-kgc
integration:
https://github.com/DerwenAI/kglab/tree/morph-kgc
Please try
pytest --nbmake examples/ex6_2.ipynb
New expected behaviour
see https://github.com/DerwenAI/kglab/issues/108#issuecomment-1048445124
Change logs
Add the
materialized()
method tokglab.py
. Add an example for it atex6_2.ipynb
Add docstring at the top of
kglab.py