SDM-TIB / SDM-RDFizer

An Efficient RML-Compliant Engine for Knowledge Graph Construction
https://doi.org/10.5281/zenodo.3872103
Apache License 2.0
107 stars 25 forks source link

Proposition to improve the RDFizer packaging #70

Closed vemonet closed 2 years ago

vemonet commented 2 years ago

Hi, I noticed the implementation of the RDFizer is taking a lot of weird roads to finally get to the actual RDF conversion

For example a typical deployment is:

It is a lot of complex steps for doing something that could be done much more easily with less breaking points

Normally you should be able to do it directly via the command line without the need for starting an endpoint. Here are a few examples of what the different command to run the rdfizer could look like:

# Without config file: 
rdfizer -m mapping.ttl -o output
# With config file
rdfizer -c config.ini
# With Docker
docker run -it -v $(pwd):/data rdfizer -c config.ini

And next to this you could easily still have an API using OpenAPI standards that allow the user to run it the way it is currently run

I am planning to take a look into it and make the packaging a bit more tight and intuitive for the users (adding direct CLI option, adding OpenAPI/swagger docs to the API

Is that something that could be interesting to you, or is the process set in stone? @eiglesias34 @samiscoding @dachafra

If you are interested in contributions, is there anything I need to know to make the change in a way that work for you?

dachafra commented 2 years ago

@vemonet the engine is already available in PyPi (https://pypi.org/project/rdfizer/) so you can easily run it as you are mentioned:

python3 -m pip install rdfizer
python3 -m rdfizer -c config.ini

The docker API was included in the very first versions of the engine for ensuring the reproducibility of our research experiments

eiglesias34 commented 2 years ago

Hello @vemonet,

First of all, thank you for your interest in the SDM-RDFizer. The execution of the SDM-RDFizer is not set in stone for the time being. Currently, we do have some improvements planned for the SDM-RDFizer. We can consider the simplification of the execution of an improvement.

Sincerely, Enrique

vemonet commented 2 years ago

Thanks for the detail @dachafra and @eiglesias34 !

I tried to use it through Docker first so I got a bit confused by the complex workflow that is needed to be deployed to run it

I think it could be easily improved to be run as a CLI through Docker instead of docker deploying an API, without even

Is having an API really necessary? Are people deploying RDFizer as a service to be queried over HTTP? If I add a Dockerfile to run the RDFizer as a CLI should I replace the Dockerfile for the API, or create a second one?

To be able to run it this way:

docker run -it -v $(pwd):/data rdfizer -c config.ini

This is useful because more and more workflow systems rely on running docker container one after the other with specific parameters. And using the RDFizer API would add more complexity to writing the workflow (and it also exposes a ports, which is not ideal for security, it creates a potential entrypoint to your network, without really needing it)