Open cblair opened 10 years ago
Observation: The lag in search may be partially occurring when rails is accessing large files to make the results list. Below is an output from a test. When searching through small documents ES responded with in 24 seconds (although 24 seconds is still too slow). However, when search gets to doc 17 it hangs (doc 17 is 6MB). I’m going to try and make this a priority for the rest of January.
This was tested see comments below.
INFO: Elasticsearch query completed in 24.342495639 seconds. User Load (771.3ms) SELECT "users".* FROM "users" WHERE "users"."id" = 1 LIMIT 1 Document Load (83.9ms) SELECT "documents".* FROM "documents" WHERE "documents"."id" = $1 LIMIT 1 [["id", 13]] Document Load (66.1ms) SELECT "documents".* FROM "documents" WHERE "documents"."user_id" = 1 Document Load (5.4ms) SELECT "documents".* FROM "documents" WHERE "documents"."id" = $1 LIMIT 1 [["id", 12]] Document Load (1.1ms) SELECT "documents".* FROM "documents" WHERE "documents"."id" = $1 LIMIT 1 [["id", 5]] Document Load (1.7ms) SELECT "documents".* FROM "documents" WHERE "documents"."id" = $1 LIMIT 1 [["id", 3]] Document Load (1.0ms) SELECT "documents".* FROM "documents" WHERE "documents"."id" = $1 LIMIT 1 [["id", 18]] Document Load (3.2ms) SELECT "documents".* FROM "documents" WHERE "documents"."id" = $1 LIMIT 1 [["id", 17]]
Accessing the metadata for each document (for the popup) may be a big cause of slow down. Accessing any large document in couch, even if the key/value is small, has been slow in the past. This should be tested. *Tested and confirmed as a cause of slow down. Suggestion; instead of collecting the metadata as part of the search change the metadata button so that it gets the metadata on click (i.e. don’t get metadata until called for).
Note that the initial HTTP request to ES returns the entire document for each document found. This gets large fast. The request/repsonse in strait Ruby was actully not bad in benchmark testing (about 0.3s for 30MB in ES). However, JSON parsing the response takes 5 or more times longer then the request (about 1.9s for 30 MB) this is not very scalable. Maybe try to have ES return just the file names and IDs?
Updaed search to just return document IDs, search is much faster now. We will lose some of the new search features but we really need search working fast for now.
Current state of search:
Thanks Shane, I'll verify the state of things and close/finish if necessary.
It sucks. Its slow. Find the slowest parts (indexing, displaying, autocomplete), and make it not suck.
Also, Shane will play with this too, but whoever finds the best solutions wins.