RMLio / yarrrml-parser

A YARRRML parser library and CLI in Javascript
MIT License
41 stars 17 forks source link

Generic Tabular transformation #184

Open lucasalbertins opened 1 year ago

lucasalbertins commented 1 year ago

Hi,

I am trying to build a generic tabular structure in RDF that I could use to load CSV and excel files. I've created triples to represent rows, columns and cells, and triples to relate them all. I am trying to use YARRRML to parse a CSV like this to this format, but I would need to keep track of the column indexes for that. Is there any way to get the index of a given column or maintain a variable during the transformation for that purpose? Below is a simplified description of what would be the RDF for this CSV:

ex:row-0
        rdf:type ex:Row ;
        ex:tabular#hasCell   ex:cell-00, ex:cell-01, ex:cell-02;
        ex:tabular#hasRowId  0 .
... (several rows increasing index)
ex:col-0
        rdf:type ex:Column ;
        ex:tabular#hasCell ex:cell-00, ex:cell-10, ex:cell-20 ;
        ex:tabular#hasColumnI 0 .
... (several columns increasing index)
ex:cell-00
        rdf:type ex:Cell ;
        ex:tabular#hasRowId 0 ;
        ex:tabular#hasColumnId 0 ;
        ex:tabular#hasValue "0" .
ex:cell-01 
        rdf:type ex:Cell ;
        ex:tabular#hasRowId 0 ;
        ex:tabular#hasColumnId 1 ;
        ex:tabular#hasValue "100" .
... (several cells increasing row and column indexes)
bjdmeest commented 1 year ago

I'm afraid this is currently not supported by the RML specification, and thus also not by YARRRML.

We're currently working on new versions of the specifications so I included your question in our process (see the linked issues).

However, given this is a very specific type of mapping that probably won't ever change, I'm not sure using YARRRML/RML is here the most efficient way forward: you probably won't need to update that mapping very often.

Can you further explain your use case to understand why you're doing this and how YARRRML currently helps you in that goal?

lucasalbertins commented 1 year ago

Thank you for your answer. Indeed, the idea is to have a transformation of several tabular types of files (CSV, xls, etc.) into a generalized structure in RDF, aiming the querying and reasoning over it. So it is expected not to change that much. We thought of using RML/YARRRML for that, but implementing our own transformation may be the best way to go.

namedgraph commented 1 year ago

@lucasalbertins https://github.com/AtomGraph/CSV2RDF might do what you need