fluree / ledger

Fluree ledger server source
GNU Affero General Public License v3.0
77 stars 8 forks source link

Analytical queries Error with filters on `rdf:type` #214

Closed Domenic-MZS closed 11 months ago

Domenic-MZS commented 1 year ago

Analytical queries Error with filters on rdf:type

Hey y'all, good morning :grin: , i recently noticed about the new release, so i went to give it a shot 🦅 ,

and i found something odd...

When i execute the following query:

{
  "select": {"?s": ["*"]},
  "where": [
    [
      "?s",
      "rdf:type",
      "test"
    ],
    {
      "filter": [
        "(or (= ?s 351843720899262) (= ?s 351843720899261))"
      ]
    }
  ]
}

I receive the following response/error:

"Filter function uses variable: ?s however that variable is not used in a where statement or was already used in another filter function."

The expected behaviour is to get two results, matching the subjects 351843720899262 and 351843720899261.


The query provided executed without problems on the previous version, so maybe there's a migration/workaround or something that i miss to this ?

Domenic-MZS commented 1 year ago

I know it could be written like:

{
  "select": ["*"],
  "from": [
    351843720899262,
    351843720899261
  ]
}

But if we go to a more complex and realistic scenario like applying a filter on a certain group of ids, that solution doesn't work at all.


EDIT: The Complete Scenario

{
    "select": {"?s": ["*"]},
    "where": [
        ["?s", "rdf:type", "test"],
        ["?s", "test/a", "?a"],
        {
            "filter": [
                // user selection to search
                "(or (= ?s xxxxxxx) (= ?s yyyyyy))",
                // where the search filter (user - input) applies 
                "(re-find (re-pattern \"example\") ?a)"
            ]
        }
    ]
}
Jackamus29 commented 1 year ago

Hey @Domenic-MZS If you're trying to work with a subset of subjects, you could try using the vars binding to bind the subset to a variable, then use that variable in the where clauses just as you are. Here's the doc on the vars binding: https://developers.flur.ee/docs/overview/query/analytical_query/#vars-key

And here's what I'm picturing for the query:

{
    "select": {"?s": ["*"]},
    "where": [
        ["?s", "rdf:type", "test"],
        ["?s", "test/a", "?a"]
    ],
    "vars": {
        "?s": [xxxxxxx, yyyyyy]
    }
}
Jackamus29 commented 1 year ago

As for the regular expression use case, we usually see searching for values based on user input with the fullText functionality, as this is what users normally expect the behavior to be rather than the user providing a regular expression. Here's the doc on fullText https://developers.flur.ee/docs/concepts/analytical-queries/full-text-search/

And the query might look like:

{
    "select": {"?s": ["*"]},
    "where": [
        ["?s", "rdf:type", "test"],
        ["?s", "fullText:test/a", "example"]
    ]
}
Jackamus29 commented 1 year ago

Give me a little more time to see if I can get an answer for you on the error message you're seeing:

"Filter function uses variable: ?s however that variable is not used in a where statement or was already used in another filter function."

Domenic-MZS commented 1 year ago

Hi @Jackamus29, thanks for the feedback 👾 🪅 ,

1.

Hey @Domenic-MZS If you're trying to work with a subset of subjects, you could try using the vars binding to bind the subset to a variable, then use that variable in the where clauses just as you are. Here's the doc on the vars binding: https://developers.flur.ee/docs/overview/query/analytical_query/#vars-key ....

I did try using the vars and bind alternatives earlier, but i encounter some duplication issues 🐛 and some variant/unexpected behaviour regardless of what i did (in certain scenarios, like the one you provided)...

| As far as i know (from what I was testing), the ["?subjectVar", "object", "?valueVar"] works pretty much like the ["?subjectVar", "rdf:type", "collection"] , except from the ?valueVar as a filter condition... but in both of them, the subject variable was unset on filter map clauses. In addition to that, if i ignore the usage of subject variables on my where clausule, i get each var duplicated (like the rdf + var execution)

2.

...

Yup, that's a very clean implementation you got there 🌵, and im actually using it a lot ♻️ , however, in some scenarios I have an Advanced Search 🔍 that works with RegExp (until we implement some AI/Vectorial powered search), relationship 🧬 searches, (and yara, yara)

I think, maybe it's a signal to refactor and think more about the code organization from my part 😖


3

Give me a little more time to see if I can get an answer for you on the error message you're seeing: ...

Absolutely, don't even bother about that, and seriously, thanks a lot for the help 🤝 , im looking forward to use fluree

Jackamus29 commented 1 year ago

yeah sure thing! If you'd like, we can dive more into your #1 there. Maybe settle on a query you like that seems like it should handle your use case well and then we can troubleshoot if/where necessary. Happy to help! Seems like you have an interesting application you're building!

Domenic-MZS commented 1 year ago

Sure, i think the main goal that solves this issue is:

  1. Have/Work on a limited set of entities from the whole collection (like 2 or more entities (subset))
  2. Filter some value only on that limited range/subset of entities (not the whole collection, just a small part of, based on IDs if possible)

    About the #1 thing, i'll be sengind the input/output i'm getting

Domenic-MZS commented 1 year ago
Running the provided query 'From Subset A and B, bring me all the entities that have predicate a (from that subset) ```json { "select": {"?s": ["*"]}, "where": [ ["?s", "rdf:type", "test"], [ "?s", "test/a", "?a"] ], "vars": {"?s": [351843720888323, 351843720888322]} } ```
Then i get my subset + (plus) all the other values that have the predicate a > NOTE: All Collection Values X2 Times is the result ```json [ { "_id": 351843720888323, "test/a": "pegsi" }, { "_id": 351843720888322, "test/a": "nikko" }, { "_id": 351843720888321, "test/a": "soda" }, { "_id": 351843720888320, "test/a": "agua" }, { "_id": 351843720888323, "test/a": "pegsi" }, { "_id": 351843720888322, "test/a": "nikko" }, { "_id": 351843720888321, "test/a": "soda" }, { "_id": 351843720888320, "test/a": "agua" } ] ```
Domenic-MZS commented 1 year ago

Eureka!

Hi @Jackamus29 ! I'm glad to share with you the solution/alternative i found to the subset problem, and I also have some new things to report in the matter (maybe?), so let's start with the alternative.

After trying some variations on your solutions, I ended up with the following query:

{
  "select": { "?s": [ "*" ] },
  "where": [
    [ "?s", "_id", "?s" ], # this fixes the `?s, rdf:type, test` duplication issue
    # ... [more filters using the '?s' as source/subject]
    # NOTE: Subjects can not be used within filters, only values
  ],
  "vars": { "?s": [ xxxxxx, yyyyyy ] } # your subset
}

This version replaces the ['?s', 'rdf:type', 'test'] , which for some reason it was replacing with the vars and then the other subjects (from the collection/type).

Note: For some reason neither subjects nor vars can be filtered or used unless it is a subject placeholder (like [subjectplaceholder, predicate, value])


Last but not least, the OR command is not working on simple filters, i may open another issue for this, but i did found some overengineer alternatives to it (however, that behaviour is not intended).

Domenic-MZS commented 1 year ago

I think the ISSUE should be renamed to include the generic ID/value issue instead, what do you think @Jackamus29 ?

Jackamus29 commented 1 year ago

Awesome! Glad you found a work around! I still plan to get all of this info in front of our Core team, but unfortunately it looks like I won't be able to do that until at least the end of this week. Stay tuned!

Also, if you want to rename this issue or make another - it's up to you, I don't have a preference.

Domenic-MZS commented 11 months ago

Closed due to inactivity.