DeepGraphLearning / ULTRA

A foundation model for knowledge graph reasoning
MIT License
437 stars 58 forks source link

How to query with code? #25

Open marimeireles opened 3 weeks ago

marimeireles commented 3 weeks ago

Hey @migalkin, thank you for making this project open source! It's really awesome :)

I'm trying to use Ultraquery on my ontologies but I'm new to the ecosystem and I'm struggling to understand how to write queries in Python. I understand the overall idea of performing search with FOL and how it translates to this: https://github.com/DeepGraphLearning/ULTRA/blob/561f2c79abaf51901702d1712554c35e3739bb8d/ultra/datasets_query.py#L23-L40 But I haven't found a piece of code that is querying a database. I've unpickled the query files that were used to train Ultra... But that also didn't prove very enlightening.

I think I'm going in the complete wrong direction. That's why I'm asking for help!

Once I've figured this stuff out I'm happy to write a little tutorial or docs for newbies like me to get started with the project, if you think it'd be beneficial.

Best and thanks again,

migalkin commented 3 weeks ago

Hey hey, thanks for looking into this.

Indeed, the current query notation is not really SPARQL- or database-ready - it is inherited from how Query2Box defined the queries in this parenthesis notation. There is, however, a direct mapping between this notation and SPARQL, for example,

The "query engine" in UltraQuery expects the parenthesis notation to derive the execution order, so in case you want to answer SPARQL queries with the model, there should be some external module (smth rule-based or even LLM, hehe) that parses SPARQL queries to this notation.

The graphs are expected to be PyG Data objects residing in memory (RAM or GPU memory). PyG itself does have some bindings to call external graph databases but there is no specific piece of code to query external databases. Might be a useful PR though!

marimeireles commented 2 weeks ago

Very cool, that was a great explanation @migalkin.

smth rule-based or even LLM, hehe

That's exactly my end-goal! 😅 I'm struggling so others don't have to.

marimeireles commented 2 weeks ago

Sorry, so you're saying that the right way, or the best way, of going about this is creating a torch_geometric.data.GraphStore with your dataset and then model(graph, query)?

migalkin commented 2 weeks ago

Somethings like this, yes. Generally, it depends on the graph size - you probably don't need all the hassle with sizes of < 100K nodes or < 5M edges: those can easily fit into most modern GPUs.