agnesoft / agdb

Agnesoft Graph Database
Apache License 2.0
51 stars 2 forks source link

[db] Support InsertOrUpdate #1092

Closed pinghe closed 3 weeks ago

pinghe commented 1 month ago

[db] Support InsertOrUpdate feature

michaelvlach commented 1 month ago

Hi, thanks for the issue. Can you please elaborate on what the feature should be? Current insert functionality is already "insert or update". Since there cannot be duplicate keys upon insert the query either creates a new key if it does not exist or update existing value if it does. I could see a use case for just "update", i.e. fail the query if the key does not exist.

pinghe commented 1 month ago

I intend to perform a full data synchronization from the client to the database, without the client having knowledge of whether each node or the corresponding alias already exists. My current data model strategy involves using the uid field of the node data for alias naming. Nodes are created with the alias method. However, the current implementation of agdb requires that the existence of an alias be verified first; if the alias exists, it results in an immediate error.

QueryBuilder.insert().nodes().aliases(["users"]).query()

michaelvlach commented 1 month ago

I understand now. That is definitely doable and I think inserts with existing aliases should not fail in order to uphold the insertOrUpdate.

A suggestion what to use instead until this is implemented you could also use an index but I suspect you are doing something similar already using an alias. The one reason I would reommend index is that it is not limited to just one "string" value per node:

let mut db = Db::new();
db.exec_mut(&QueryBuilder::insert().index("id").query());
let uid = Uuid::new_v4().to_string();
db.exec_mut(&QueryBuilder::insert().nodes().values(vec![vec![("uid", &uid).into()]]).query());
//...
db.transaction_mut(|t| {
db.transaction_mut(|t| {
    let node = t.exec(&QueryBuilder::search().index("uid").value(&uid).query())?;
    if node.result == 0 {
        t.exec_mut(&QueryBuilder::insert().nodes().values(vec![vec![]]).query())
    } else {
        t.exec_mut(
            &QueryBuilder::insert()
                .values(vec![vec![]])
                .ids(node)
                .query(),
        )
    }
});
michaelvlach commented 1 month ago

Anyway, I will update the insert with aliases so it does not fail on existing ones.

pinghe commented 1 month ago

Thx. The backend I'm using directly is agdb_server, and the client is a web application in JavaScript. If the client has 10,000 nodes, and each node requires a check for existence before insertion, this would result in 20,000 interactions between the client and the server, which is too resource-intensive. The current agdb_server does not support the construction of such queries by the client's QueryBuilder, which would allow for only one interaction between the client and the server to provide this capability.

pinghe commented 1 month ago

When inserting edges, there is a similar issue. For example, if there is an edge of type 'out' between two nodes, when I attempt to insert another edge of type 'out', it will no longer create a new edge but will simply update the 'values' instead."

when attempting to insert an edge of the same type between two nodes that already have such an edge, the system will not create a duplicate edge but will update the existing edge's values.

michaelvlach commented 1 month ago

Yes with the server one cannot batch the "if". I might implement more general insertOrUpdate accounting for topology as well including edges. Thanks again for bringing this up, good use case. :-)

michaelvlach commented 1 month ago

I think I have a solution that is robust enough to support any use case in the "insertOrUpdate" area. Motivational example how it would look like:

QueryBuilder()
    .insert()
    .edges()
    .ids(QueryBuilder()
        .search()
        .from(1)
        .where_()
        .key("key".into())
        .query())
    .from(1)
    .to(2)
    .values(vec![vec![("key", 1).into()]])
    .query() 

Basically adding ids() to all insert queries and not just insert().values() because it can take a search query. If the set in the ids would be empty the insert would happen. If it yielded a result (was not empty) it would update. It covers more cases than if it was based solely on inputs to the insert so should be more flexible - imagine your case where you want to update nodes when they contain some UID but not necessarily all the other values you could be inserting.

I will separately update the insert nodes if the alias exists so that does not to fail because that should not be failing regardless I think. The above however will not support aliases as they are not searchable, they are more topology anchors akin to table names in RDBMS/SQL. Still I find it neat you figured you could use them to simulate "insertOrUpdate"!

pinghe commented 1 month ago

I believe this is a great solution.