DerwenAI / kglab

Graph Data Science: an abstraction layer in Python for building knowledge graphs, integrated with popular graph libraries – atop Pandas, NetworkX, RAPIDS, RDFlib, pySHACL, PyVis, morph-kgc, pslpython, pyarrow, etc.
https://derwen.ai/docs/kgl/
MIT License
574 stars 65 forks source link

Add morph-kgc materialize #220

Closed Mec-iS closed 2 years ago

Mec-iS commented 2 years ago

Please try pytest --nbmake examples/ex6_2.ipynb

New expected behaviour

see https://github.com/DerwenAI/kglab/issues/108#issuecomment-1048445124

Change logs

Add the materialized() method to kglab.py. Add an example for it at ex6_2.ipynb

Add docstring at the top of kglab.py

arenas-guerrero-julian commented 2 years ago

Hi @Mec-iS ,

Keep in mind that materialize can also receive a config in the form of a string and not a path. E.g.:

config = """
            [DataSource1]
            mappings=/path/to/mapping/mapping_file.rml.ttl
            db_url=mysql+pymysql://user:password@localhost:3306/db_name
         """

graph = morph_kgc.materialize(config)

see doc

Mec-iS commented 2 years ago

yes thanks. I am just doing it step by step to test the integration.

Mec-iS commented 2 years ago

It looks like the file we currently use as test recipe.ttl has a faulty header according to morph-kgc

~/drwn/.venv/lib/python3.8/site-packages/morph_kgc/args_parser.py in load_config_from_argument(config_entry)
     86     config = Config(interpolation=ExtendedInterpolation())
     87     if os.path.isfile(config_entry):
---> 88         config.read(config_entry)
     89     else:
     90         # it is a string

MissingSectionHeaderError: File contains no section headers.
file: '/home/lorenzo/drwn/kglab/dat/recipes.ttl', line: 1
'@prefix dct:  <http://purl.org/dc/terms/> .\n'
arenas-guerrero-julian commented 2 years ago

Hi @Mec-iS ,

materialize expects a .ini file, not a .ttl. For testing morph-kgc with recipe you would need to define an RML mapping from recipes.csv to recipes.ttl. And prepare the config.ini.

Other option is to use one of the examples in the morph-kgc repo e.g. the json-example

Mec-iS commented 2 years ago

Thanks for the feedback. I am trying to establish a baseline. I have added a default ini as taken from here.

What is the expected way to

define an RML mapping from recipes.csv to recipes.ttl

?

In the kglab repository we already have both formats, do you mean something like this? I think I am missing the point on how to generate the mapping file, .rml.ttl or .rml.csv.

ceteri commented 2 years ago

MissingSectionHeaderError: File contains no section headers. file: '/home/lorenzo/drwn/kglab/dat/recipes.ttl', line: 1 '@prefix dct: http://purl.org/dc/terms/ .\n'

Does that File contains no section headers message refers to the input .ini file, and not the recipes.ttl file? There's no particular definition of "section headers" in RDF files.

We've used recipes.ttl with a number of different platforms and validators, with no errors.

It does go into @prefix definitions without specifying the optional @base – if that may have triggered a warning?

ceteri commented 2 years ago

In the kglab repository we already have both formats, do you mean something like this?

For this integration, the biggest use cases will be to formalize how sources from SQL, CSV, etc., can be ingested into an RDF graph. We're not looking at means to translate between serialization formats, e.g., go between CSV and TTL.

While there are means of importing CSV already in kglab, through the csvwlib integration, this integration with morph-kgc would provide superior means, and also ways to parallelize and make these kinds of inputs much more efficient.

In the most immediate use cases (for our colleagues in Madrid and Murcia) they have many smaller SQL databases and thousands of CSV files, so there are performance issues at scale for ingest, and Morph can really help! :)

Mec-iS commented 2 years ago

@ceteri

i don't think this is finished because the mapping RML file is missing (what in the morph-kgl documentation has extension .rml.ttl). The ttl file is not enough to make it work, there should be a way to generate the RML mapping from a ttl or CSV.

ceteri commented 2 years ago

@Mec-iS Thank you, I've reverted this merge.

Was just trying examples within my own branch and ran into problems with the csv-examples in Morph, when running locally.

Instead of applying an RML mapping to a TTL file as input, how about if we show an example in the tutorial notebook that takes 2 simple CSV files? This is a general pattern among users, where they already have node+edge files as CSV.

@ArenasGuerreroJulian do you have an simple RML for CSV examples? Something like the proverbial minimum viable that would show nodes and edges for a simple graph? That would help for our community, where they aren't familiar with RML yet. For example, how about something like this? https://rml.io/yarrrml/tutorial/getting-started/#example

ceteri commented 2 years ago

@Mec-iS here's a branch with preparations for a new release, for the morph-kgc integration: https://github.com/DerwenAI/kglab/tree/morph-kgc