INCATools / semantic-sql

SQL and SQLite builds of OWL ontologies
https://incatools.github.io/semantic-sql/
BSD 3-Clause "New" or "Revised" License
37 stars 3 forks source link

Make everything installable via PyPI #41

Open cmungall opened 2 years ago

cmungall commented 2 years ago

As an alternative to wrapping rdftab is to directly load the statements table in Python. This will be slower, but it should be very straightforward if we skip loading of the stanza field, which we don't use. It will also have the advantage that we don't need to do transformations to RDF/XML using riot or robot.

cmungall commented 2 years ago

See https://github.com/cmungall/relation-graph-py

Note it may not necessary to wrap rdftab using PyO3, we can use any rdf library (we don't use the stanza field from rdftab)

cmungall commented 2 years ago

Consider instead: https://github.com/balhoff/whelk-rs

joeflack4 commented 2 years ago

@hrshdhgd @cmungall I was trying to get semsql to work today in order to troubleshoot some issues I'm having with trying to use SqlImplementation in OAK.

I had a lot of problems with version 0.1.7 of semsql, so I installed the latest version, 0.2.0, but now I'm getting this error: /bin/sh: relation-graph: command not found

For now, should I continue using semsql==0.1.* (resolves to 0.1.7)?

Error message

/bin/sh: relation-graph: command not found

Related

I think this is intentional on OAK's end because of the above error, but I just wanted to let you know that this came up as well:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
oaklib 0.1.34 requires semsql<0.2.0,>=0.1.6, but you have semsql 0.2.0 which is incompatible.
cmungall commented 2 years ago

Hi @joeflack4 - always use the latest version. If you are having issues with RG file an issue here: https://github.com/balhoff/relation-graph/issues.

Did you get your issue resolved?

jamesaoverton commented 2 years ago

Using PyO3 for RDFTab is certainly possible, but I wasn't planning to do it because I'll be using LDTab going forward. We've used PyO3 for valve.py and wiring.py and are working on using it LDTab (using horned-owl). We're happy to share our experience.

For this purpose, I think you're probably better off just porting RDFTab to Python.

cmungall commented 2 years ago

@jamesaoverton - that makes sense.

the speed of rdflib is the main issue. even though we get very fast access once we have built the sqlite db, there are still cases where latency in the build is an issue. but certainly having this as an option seems reasonable.

I'm figuring medium term python bindings to horned-owl will solve a lot of use cases...

LucaCappelletti94 commented 2 years ago

Please do be advised that you will encounter the following complex issues:

Do take these things into account while designing your build and deploy process. It took quite a while for us to figure out how to do this for Ensmallen.

joeflack4 commented 2 years ago

Just linking the Slack thread that Chris opened: https://obo-communitygroup.slack.com/archives/C03D93DEALA/p1661527315827469

jamesaoverton commented 2 years ago

I agree with @LucaCappelletti94: Getting PyO3 to work has been the easy part, and cross-compiling binaries for packaging has been much tricker. With a lot of effort we have a workflow to compile for major architectures and push to PyPI using GitHub Actions. This has been tested but is not yet on production: https://github.com/ontodev/valve.py/blob/valve_rs_python_bindings/.github/workflows/build-and-publish-wheels.yml

Suggestions for improvements are welcome.

cmungall commented 2 years ago

I have an experimental replacement for rdftab.rs:

https://github.com/INCATools/rdf-sql-bulkloader

this doesn't do any rust binding itself, it relies on https://github.com/ozekik/lightrdf for that part. If this is fruitful, we may want to coordinate with the devs of this to make sure they have best practice for releasing wheels etc

I am still doing perf tests (https://github.com/INCATools/rdf-sql-bulkloader/issues/1)

UPDATE the bulkloader now uses pyoxigraph which seems better supported

cmungall commented 2 years ago

I added a general discussion for rust depenencies in OAK here:

https://github.com/INCATools/ontology-access-kit/discussions/247