dgraph-io / dgraph

The high-performance database for modern applications
https://dgraph.io
Other
20.31k stars 1.49k forks source link

Issues with k shortest path handling #3504

Closed campoy closed 4 years ago

campoy commented 5 years ago

There are three different issues that could be improved on calls to shortest.

Variables should be accepted (#1243)

A quick example of why this should be accepted.

{
  sp as var (func: eq(name@en, "Steven Spielberg")) @filter(has(director.film))
  jg as var (func: eq(name@en, "Jeff Goldblum")) @filter(has(director.film))

  p as shortest(from: uid(sp), to: uid(jq)) {
     ...predicates...
  }

  path(func: uid(p)) {
    uid
    name@.
  }              
}

Currently we need to specify which predicates to consider

This is most probably due for performance issues, but it makes it really hard to find random connections from A to B.

For instance, consider the query above and try to guess which predicates should go where ...predicates....

After a while, and with @danielmai's help, we ended up figuring out it was:

    director.film
    starring
    performance.actor

Multiple shortest queries overwrite the resulting path

Calls to shortest produce an output under _path_. Unfortunately, this means that if you have multiple calls to shortest the results will overwrite each other and only one will eventually be sent back in the response.

See this example:

{
  p as shortest(from: 0x22f1e0, to: 0x82ec65) {
    director.film
    starring
    performance.actor
  }

  q as shortest(from: 0x22f1e0, to: 0x5092b) {
    director.film
    starring
    performance.actor
  }

  path1(func: uid(p)) {
    uid
    name@.
  }    

  path2(func: uid(q)) {
    uid
    name@.
  }    
}

The output will contain a single _path_ corresponding to the last call to shortest.

{
  "extensions": {
    "server_latency": {
      "parsing_ns": 45165,
      "processing_ns": 114340665,
      "encoding_ns": 935863
    },
    "txn": {
      "start_ts": 22728120
    }
  },
  "data": {
    "path1": [
      {
        "uid": "0x22f1e0",
        "name@.": "Steven Spielberg"
      },
      {
        "uid": "0x22ca64",
        "name@.": "The Lost World: Jurassic Park"
      },
      {
        "uid": "0xf50c"
      },
      {
        "uid": "0x82ec65",
        "name@.": "Jeff Goldblum"
      }
    ],
    "path2": [
      {
        "uid": "0x22f1e0",
        "name@.": "Steven Spielberg"
      },
      {
        "uid": "0x51cd20",
        "name@.": "Bridge of Spies"
      },
      {
        "uid": "0x398f09"
      },
      {
        "uid": "0x5092b",
        "name@.": "Tom Hanks"
      }
    ],
    "_path_": [
      {
        "director.film": [
          {
            "starring": [
              {
                "performance.actor": [
                  {
                    "uid": "0x5092b"
                  }
                ],
                "uid": "0x398f09"
              }
            ],
            "uid": "0x51cd20"
          }
        ],
        "uid": "0x22f1e0"
      }
    ]
  }
}
campoy commented 5 years ago

One more issue we found is how when using a variable to store the result of calling shortest, only the first path will be stored in that variable!

So for instance, if I want to find 2 the movies directed by Steven Spielberg where Jeff Goldblum appears as an actor I could write the following:

{
  p as shortest(from: 0x22f1e0, to: 0x82ec65, numpaths:2) {
    director.film
    starring
    performance.actor
  }

  path(func: uid(p)) {
    uid
    name@.
  }    
}

While the _path_ data element will show all the paths we found, the path data will only show the first one.

{
  "extensions": {
    "server_latency": {
      "parsing_ns": 18057,
      "processing_ns": 63379742,
      "encoding_ns": 733893
    },
    "txn": {
      "start_ts": 22790001
    }
  },
  "data": {
    "path": [
      {
        "uid": "0x22f1e0",
        "name@.": "Steven Spielberg"
      },
      {
        "uid": "0x414ee0",
        "name@.": "Jurassic Park"
      },
      {
        "uid": "0x14f22a"
      },
      {
        "uid": "0x82ec65",
        "name@.": "Jeff Goldblum"
      }
    ],
    "_path_": [
      {
        "director.film": [
          {
            "starring": [
              {
                "performance.actor": [
                  {
                    "uid": "0x82ec65"
                  }
                ],
                "uid": "0x14f22a"
              }
            ],
            "uid": "0x414ee0"
          }
        ],
        "uid": "0x22f1e0"
      },
      {
        "director.film": [
          {
            "starring": [
              {
                "performance.actor": [
                  {
                    "uid": "0x82ec65"
                  }
                ],
                "uid": "0xf50c"
              }
            ],
            "uid": "0x22ca64"
          }
        ],
        "uid": "0x22f1e0"
      }
    ]
  }
}

The second movie with UID 0x22ca64 is "The Lost World: Jurassic Park"

minhaj-shakeel commented 4 years ago

Github issues have been deprecated. This issue has been moved to discuss. You can follow the conversation there and also subscribe to updates by changing your notification preferences.

drawing