cravattlab / cravattdb

Automated RAW conversion and upload to IP2
0 stars 0 forks source link

Searches still take a long time to return lots of results #33

Closed radusuciu closed 8 years ago

radusuciu commented 8 years ago

Need to examine SQLAlchemy generated queries and try to optimize or write SQL by hand.

The issue is presumably not with the initial lookup which just dumps a list of experiment ids that match the data. The lookup of each of those individual experiments is my prime suspect.

It is also possible that we need to paginate results... With 150 experiments uploaded, searching for kinase yields on the order of 10000 results.

Finally, we could forego pagination but stream or batch-return results to the client.

radusuciu commented 8 years ago

Sooo, I've solved the DB side issues with better indexing. Commit to come once I figure out how to express these index types in sqlalchemy. A combination of including lower, text_pattern_ops, pg_trgm, and GIN for full text-search worked wonders to reduce query execution speed from seconds to fractions of a second.

For large sets of results we now have to deal with serialization being the bottleneck..

The simple solution is pagination. Serialization using something other than marshmallow is also an option.

radusuciu commented 8 years ago

Closing since we're running well enough for most queries.. Will deal with serialization separately.