dgraph-io / dgraph

The high-performance database for modern applications
https://dgraph.io
Other
20.33k stars 1.49k forks source link

[BUG]: Count cannot be assigned to a value variable #8688

Open Poolshark opened 1 year ago

Poolshark commented 1 year ago

What version of Dgraph are you using?

Dgraph Cloud v21.03.0-92-g0c9f60156

Tell us a little more about your go-environment?

No response

Have you tried reproducing the issue with the latest release?

Yes

What is the hardware spec (RAM, CPU, OS)?

Tier: 2R Storage: 32 GB Dgraph Hosted: Yes High Availability: No SLA: 99.5% Schema Mode: Flexible

This error can also be reproduced on a Shared Cluster (Free Tier) with a minimal schema.

What steps will reproduce the bug?

1. Deploy minimal schema

type Course {
  id: ID!
  title: String!
  chapters: [Chapter!]!
}

type Chapter {
  id: ID!
  title: String!
  sequence: Int
}

2. Perform initial DQL mutation

{
  "set": {
    "dgraph.type": "Course",
    "Course.title": "Course 1",
    "Course.chapters": [
      {
        "dgraph.type": "Chapter",
        "Chapter.title": "Chapter 1",
        "Chapter.sequence": 0
      },
      {
        "dgraph.type": "Chapter",
        "Chapter.title": "Chapter 2",
        "Chapter.sequence": 1
      }
    ]
  } 
}

3. Perform upsert DQL mutation

{
  "query": "{ qTest(func: uid(0x1)) { u as uid c as count(Course.chapters) } }",
  "mutations": [
    {
      "set": {
        "uid": "uid(u)",
        "Course.chapters": [
          {
            "dgraph.type": "Chapter",
            "Chapter.title": "Test Chapter",
            "Chapter.sequence": "val(c)"
          }
        ]
      }
    }
  ]
}
Response
{
  data: {
    code: "Success"
    message: "Done"
    queries: {
      qTest: [
        0: {
          uid:"0x1"
          count(Course.chapters): 2
        }
      ]
    }
    uids: {
      dg.3162278161.22051:"0x1"
    }
}

Expected behavior and actual result.

We would expect that Chapter.sequence of the new Chapter Test Chapter would equal 2 since count(Course.chapters) validates 2.

This behaviour is also documented in the official docs.

Additional information

Chapter.sequence of the newly created Chapter Test Chapter remains undefined.

mrwunderbar666 commented 1 year ago

Quick question/clarification: do you use the JSON mutation format for this upsert:

{
  "query": "{ qTest(func: uid(0x1)) { u as uid c as count(Course.chapters) } }",
  "mutations": [
    {
      "set": {
        "uid": "uid(u)",
        "Course.chapters": [
          {
            "dgraph.type": "Chapter",
            "Chapter.title": "Test Chapter",
            "Chapter.sequence": "val(c)"
          }
        ]
      }
    }
  ]
}

Because the JSON mutation format still does not support val() (see: https://dgraph.io/docs/dql-syntax/json-mutation-format/#forbidden-values)

Poolshark commented 1 year ago

Well, that's new! All my mutations are in JSON and uid as well as val works for all other tasks except count. Also the example above will produce a valid DB entry, just without count.

Plus I don't think that this is what forbidden values means. I understand that you are not supposed to have string values containing uid and val. So something like this

{
  "uid": "uid(someValue)"
}

is valid, whereas

{
  "uid": "uid(someValue)",
  "SomeType.someField": "uid(some string)"
}

is not valid.

mrwunderbar666 commented 1 year ago

Interesting! Then I find the documentation a bit unclear in this section.

MichelDiz commented 1 year ago

Try something like

{
  "query": "{ qTest(func: uid(0x1)) { u as uid c as count(Course.chapters) }  me() { c2 as max(val(c)) } }",
  "mutations": [
    {
      "set": {
        "uid": "uid(u)",
        "Course.chapters": [
          {
            "dgraph.type": "Chapter",
            "Chapter.title": "Test Chapter",
            "Chapter.sequence": "val(c2)"
          }
        ]
      }
    }
  ]
}

Related https://github.com/dgraph-io/dgraph/issues/4779

Poolshark commented 1 year ago

Hi @MichelDiz!

Thanks for the reply. I've tried and now I get the error:

"line 1 column 98: Only aggregation/math functions allowed inside empty blocks. Got: mas"

However, I think now that the problem is not really a bug but rather a problem related to how value variables are mapped. From what I understand from the documentation, value variables get mapped to the UID of the corresponding block. This means that if I try to generate a new DB entry within my mutation, the UID of this entry does not exist yet and thus there is no way of ever getting a value other than undefined from val(c).

In my opinion, this contradicts with the user intention. I guess the intention of this mapping is that we can use the same alias twice in two different blocks but I do not think that this is the user intention. A user can alway pre/postfix an alias to make sense for his use case (eg. by adding the corresponding typename, the query name, etc.). This way value variables could simply be stored as a key/value pair and used everywhere in an upsert mutation.

Therefore, since this is not a bug, we can safely resolve this issue. Maybe you could update the documentation slightly so people don't get confused here.

Would it be possible that you show me where the actual mapping takes place in the Dgraph source? If I have time I could check if it would be worth a feature request / pull request.

Cheers!

MichelDiz commented 1 year ago

@Poolshark that was a typo of mine. Instead of "mas" it should be "max".

Poolshark commented 1 year ago

Hi @MichelDiz

It works! 🙌 Could you elaborate what it does?

MichelDiz commented 1 year ago

This is a "Hack".

In my opinion(Opinion, because I didn't get to study the query system), queries are linked to a "map" at the query runtime. This hack of mine aims to exploit aggregation in an empty block (which has no link/map with any uid). And thereby capture the value as it were in a "global" way. As it is outside this mapping, it can be used in any other field. But it's not perfect. There are times when it doesn't work. How bulk upserts can be catastrophic. And how is using an aggregation function. Maybe not ideal for other cases. It's not an elegant solution, but it works for 1-to-1 queries.

Who could explain in detail is the creator of the Upsert Block @mangalaman93

mangalaman93 commented 1 year ago

I think count function is special and it would make sense for us to fix this for upsert. If you are using count, you don't need to have to do max like Michel showed. The query engine should realize that count is not a typical val. I would keep this issue around. I think the issue is well explained and shows a real use case. It will help us improve the query engine. Thanks

MichelDiz commented 1 year ago

That's not just about count. If you see the examples I made here #4779. I'm not using count. BTW, you are there in that conversation @mangalaman93 .

Poolshark commented 1 year ago

Hey @MichelDiz & @mangalaman93!

Thanks for keeping this issue alive and sorry for not being active recently. However, I've played around with the query syntax a bit and found some of the behaviour really confusing. E.g. in #4779 @mangalaman93 states

I recommend that we keep the current behaviour for val as it is because it provides one to one mapping of nodes to their values which is similar to single row update that other databases provide.

So, if I understood right, this means that all value variables are unique in their definition space since they are all mapped to a unique UID. What I don't understand is, how this applies to upsert? Value variables in an upsert block need to be unique and thus mapping them to a UID does not really have any advantages in my opinion (but maybe I lack some understanding here).

To put this into an example regarding the schema from above

{
  "query": "{ qTest(func: uid(0x1)) { title as Course.title Course.chapters @filter(uid(0x2)) { title as Chapter.title } } }",
  "mutations": [
    "set": {
      "dgraph.type": "Course",
      "Course.title": "val(title)"
    }
  ]
}

does result in the error

"Some variables are declared multiple times."

since indeed test is declared twice, once for Course.title and again for Chapter.title. Moreover, the documentation states that value variables scalar values, which is also confusing since in this example test can be a single string value OR any array of strings. Therefore, scalar seems not to be the right terminology.

Since we cannot redeclare value variables inside the query block, why can't we have those variables available globally then? Even more so since the conditional statement inside the mutation block, does not care about UID mapping and value entities (although not their exact values) are available globally.

@mangalaman93 can you explain why count is different? Also, how would this example work without using max as @MichelDiz showed? In my opinion there is no other way than (mis)using the aggregate query. I've tried

{
  "query": "{ qTest(func: uid(0x1)) { Course.chapters { c as uid } } }",
  "mutations": [
    {
      "cond": "@if(ge(len(c), 1))",
      "set": {
        "uid": "0x1",
        "dgraph.type": "Course",
        "Course.chapters": {
          "uid": "_:NewC",
          "dgraph.type": ["Chapter", "Sortable"],
          "Chapter.title": "This is a new Chapter",
          "Sortable.sortings": {
            "uid": "_:NewS",
            "dgraph.type": "Sorting",
            "Sorting.sequence": "count(c)"
          }
        }
      }
    }
  ]
}

which results in the error

"strconv.ParseInt: parsing "count(c)": invalid syntax"

github-actions[bot] commented 1 month ago

This issue has been stale for 60 days and will be closed automatically in 7 days. Comment to keep it open.

Poolshark commented 1 month ago

Not stale