cblair / portal

A web interface for collecting and analyzing research data
2 stars 0 forks source link

Make Search not suck #59

Open cblair opened 10 years ago

cblair commented 10 years ago

It sucks. Its slow. Find the slowest parts (indexing, displaying, autocomplete), and make it not suck.

Also, Shane will play with this too, but whoever finds the best solutions wins.

shanesuofi commented 9 years ago

Observation: The lag in search may be partially occurring when rails is accessing large files to make the results list. Below is an output from a test. When searching through small documents ES responded with in 24 seconds (although 24 seconds is still too slow). However, when search gets to doc 17 it hangs (doc 17 is 6MB). I’m going to try and make this a priority for the rest of January.

This was tested see comments below.

INFO: Elasticsearch query completed in 24.342495639 seconds. User Load (771.3ms) SELECT "users".* FROM "users" WHERE "users"."id" = 1 LIMIT 1 Document Load (83.9ms) SELECT "documents".* FROM "documents" WHERE "documents"."id" = $1 LIMIT 1 [["id", 13]] Document Load (66.1ms) SELECT "documents".* FROM "documents" WHERE "documents"."user_id" = 1 Document Load (5.4ms) SELECT "documents".* FROM "documents" WHERE "documents"."id" = $1 LIMIT 1 [["id", 12]] Document Load (1.1ms) SELECT "documents".* FROM "documents" WHERE "documents"."id" = $1 LIMIT 1 [["id", 5]] Document Load (1.7ms) SELECT "documents".* FROM "documents" WHERE "documents"."id" = $1 LIMIT 1 [["id", 3]] Document Load (1.0ms) SELECT "documents".* FROM "documents" WHERE "documents"."id" = $1 LIMIT 1 [["id", 18]] Document Load (3.2ms) SELECT "documents".* FROM "documents" WHERE "documents"."id" = $1 LIMIT 1 [["id", 17]]

shanesuofi commented 9 years ago

Accessing the metadata for each document (for the popup) may be a big cause of slow down. Accessing any large document in couch, even if the key/value is small, has been slow in the past. This should be tested. *Tested and confirmed as a cause of slow down. Suggestion; instead of collecting the metadata as part of the search change the metadata button so that it gets the metadata on click (i.e. don’t get metadata until called for).

shanesuofi commented 9 years ago

Note that the initial HTTP request to ES returns the entire document for each document found. This gets large fast. The request/repsonse in strait Ruby was actully not bad in benchmark testing (about 0.3s for 30MB in ES). However, JSON parsing the response takes 5 or more times longer then the request (about 1.9s for 30 MB) this is not very scalable. Maybe try to have ES return just the file names and IDs?

Updaed search to just return document IDs, search is much faster now. We will lose some of the new search features but we really need search working fast for now.

shanesuofi commented 9 years ago

Current state of search:

  1. Search now only returns document IDs instead of full documents (major performance improvment).
  2. Document merging disabled.
  3. Search recommendations has been disabled.
  4. Metadata info has been added back in as a dialog that is only called when a user clicks the "Metadata" button (major performance improment). Retreiveing metadata can be slow for large docuemnts (usully no more then a few seconds), but much faster then prefetching metadata.
cblair commented 9 years ago

Thanks Shane, I'll verify the state of things and close/finish if necessary.