Improving performance querying a larger set of documents.

Happy-Ferret commented 8 years ago

Using RESTHeart as API server for a custom Windows 8.1 product search app, I ran into the following issue:

The landing page of the app is supposed to show all products at once, represented by gridview items (until the user either filters for more refined results or uses the searchbar).

This works rather well with a small dataset of 100 documents. However, the full product catalog is currently at 873 documents.

Loading the full catalog using the following query, it takes 8.1 seconds to load the landing page.

/DB/COLL?filter={"name":{$regex:"" , $ne:null}}&keys={"name":1,"ID":1}&pagesize=1000

Is there any way to speed this up? Any configuration switch I might want to try (I'm using the default yaml that shipped with v0.10.4) ? I'm hoping for 3 seconds, at the most.

mkjsix commented 8 years ago

Hi, Are you running v. 0.10.4? There was a known performance issue, so pagination has improved a lot with the recent 1.0.0 version of RESTHeart, could you please give it a try and let us know? I'm however moving the question to my colleague @ujibang as he worked a lot on this.

https://softinstigate.atlassian.net/wiki/display/RH/Speedup+Requests+with+Cursor+Pools

Happy-Ferret commented 8 years ago

Hi there. Thanks for the quick reply. It's very much appreciated.

I was indeed running 0.10.4.

On a first look though, 1.0.0 isn't much faster for my particular use case. I'm merely going by the HAL Browser performance here, since the current state of my code doesn't play ball with the data served by 1.0.0.

For some reason, the app quits with a NullReferenceException.

Will have to check in what way the stream diverges from 0.10.4 and modify my app accordingly.

EDIT: I managed to fix the regression in my code rather easily. Unfortunately, there doesn't appear to be a measurable performance difference between 0.10.4 and 1.0.0. Both take around 8 seconds for the above query.

ujibang commented 8 years ago

How long does the query takes using the mongo shell?

Try also to create an index on name.

You can do it via RESTHeart itself:

http://restheart.org/curies/1.0/cindex.html

ujibang commented 8 years ago

Can you also test how long it takes removing the filter condition?

and with filter={"name":{"$ne":null}}

(Btw why are u using the empty regex?)

Happy-Ferret commented 8 years ago

Thanks. Will try your suggestions when I get back to work.

As for the empty regex. That's sort of a "hack". Our test database has some unsanitised entries that rest in name [ text { cdata.

When I run it with an $exists filter instead, it will crash since there's no "name" string in those documents. The regex method works where $exists and $ne:null fail.

Happy-Ferret commented 8 years ago

Back at the office.

Using MongoDB's own shell and setting DBQuery.shellBatchSize to 1000, the query takes around 3 seconds (so pretty much what I'm aiming for inside my app).

Removing the filter condition, the query takes atrociously long. 15 seconds. When I run it from inside the HALBrowser, it puts my browser to a standstill.

I'm trying to index right now, but using HALBrowser's indexing feature am a tad unsure as to what I have to enter into the body.

{
keys:{"name":1}
ops:{"unique": true, "sparse": true}
}

yields an 406 (invalid data).

What would be the correct syntax here?

ujibang commented 8 years ago

I think you are missing the quotes on keys and opts properties:

{ "keys":{"name":1} "ops":{"unique": true, "sparse": true} }

Happy-Ferret commented 8 years ago

Unfortunately, this still results in an error 406.

Here's a screenshot from the index console, in case there's something else I'm doing wrong.

put

ujibang commented 8 years ago

Still missing a comma :)

{ "keys":{"name":1}, <-- "ops":{"unique": true, "sparse": true} }

Happy-Ferret commented 8 years ago

Nope. Still not working.

{
  "http status code": 406,
  "http status description": "Not Acceptable",
  "message": "error creating the index"
}

ujibang commented 8 years ago

Maybe name is not unique? In case remove that option.

Happy-Ferret commented 8 years ago

Doesn't work either.

In fact, the only thing that works somehow is removing the whole line and leaving only the name key. But then the response is empty (obviously).

The syntax really confuses me, btw.

Even

{
"keys":{"name":1, "ID":1}
}

Returns 406.

Shouldn't one be able to provide more than one key for indexing?

ujibang commented 8 years ago

Ok, I made some tests....

Let's create an index with id index1

PUT /DB/COLL/_indexes/index1
{
 "keys":{"name":1},
 "ops":{"unique": true, "sparse": true}
}
HTTP/1.1 201 Created

Note the options must of course be acceptable, for instance if the property name is not actually unique you get error 406, and the response includes the following error message:

"exception": "com.mongodb.MongoException$DuplicateKey", 
"exception message": "{ "exception: E11000 duplicate key error index: test.issue51.$index2 dup key: {...}, ..}"

If the index with id index1 has been created, and you try to create an index using the same id (the last part of the URI), you get 406 Not Acceptable. The response also includes the description of the error from MongoDB, that in this case is:

Trying to create an index with same name index1 with different key spec { id: 1, name: 1 } vs existing spec { name: 1 }

To change it, you need to delete it first and recreate:

DELETE /DB/COLL/_indexes/index1
HTTP/1.1 204 No Content

PUT /DB/COLL/_indexes/index1
{
 "keys":{"name":1, "id":1},
 "ops":{"unique": true, "sparse": true}
}
HTTP/1.1 201 Created

ujibang commented 8 years ago

FYI

From our discussion, I created a documentation page about indexes.

https://softinstigate.atlassian.net/wiki/x/SQC9

It is still work in progress...

Happy-Ferret commented 8 years ago

Very nice! Thank you.

I've solved my performance issue now. Turns out, our development server was in serious need of a reboot. After rebooting and indexing, the query is now within the bounds of 3 seconds (not precisely three seconds, but I'll fix that later or try and shave off a second on the app-side).

SoftInstigate / restheart

Improving performance querying a larger set of documents. #51