BlueBrain / nexus-forge

Building and Using Knowledge Graphs made easy
https://nexus-forge.readthedocs.io
GNU Lesser General Public License v3.0
38 stars 19 forks source link

No way to get unlimited result `forge.elastic` #241

Closed eugeniashurko closed 1 year ago

eugeniashurko commented 2 years ago

Currently, when querying resources through forge.elastic, there is no way to get all the documents.

For example, when running:

resources = forge.elastic("""
     {
        "query": {
              "term" : { "_deprecated": false }
            }
      }
  """, limit=None, debug=True)

Only 100 resources are returned. Here is the output with debug=True

Submitted query: {'query': {'term': {'_deprecated': False}}, 'size': 100, 'from': 0}

To be able to retrieve more than 100 resources, some hacks need to be implemented. For example, fixing an unrealistically large limit:

resources = forge.elastic("""
     {
        "query": {
              "term" : { "_deprecated": false }
            }
      }
  """, limit=100000, debug=True)

Would be great to have a way to get all the documents by specifying limit=None, as with forge.sparql.

MFSY commented 2 years ago

Hi @eugeniashurko ,

There is a (configured) 10 000 hit limit when querying ES. It make it difficult to get all docs.

MFSY commented 2 years ago

If this is to be addressed, that has to be at index settings level.

eugeniashurko commented 2 years ago

I see, it's a good point! But I think from the forge side we should not add this limit 100 for no reason (when limit is specifically None at least), we can just make 'unlimited' queries and then depending on the index, not more than 10k resources will be returned. This behaviour was very unexpected for me and introduced some weird hard-to-debug side-effect.