LMFDB / lmfdb

L-Functions and Modular Forms Database
Other
246 stars 198 forks source link

Multikey number field searches very slow #2336

Closed AndrewVSutherland closed 6 years ago

AndrewVSutherland commented 6 years ago

Some multi-key searches on the number field database on www.lmfdb.org seem to be timing out, even when the keys are all indexed (single-key searches on the same keys are quite fast).

For example, searching individually on any of degree = 2, class number = 4, class group = [4] is very quick, but searching on any 2 in combination or all 3 at once times out.

Things are not as bad on beta.lmfdb.org (but still noticeably slow). Presumably this is due to the difference between the mmap and wired tiger db engines.

I think we either need to add compound indices on key combinations we think are likely to come up frequently (certainly degree+anything is a common use case), or we need to consider switching the production server back to mmap (either option will involve some down time).

jwj61 commented 6 years ago

It would be easy to add more indexes, but isn't there a limit on the total number a collection can have?

AndrewVSutherland commented 6 years ago

Yes, I believe 64 is the limit; we currently have 17. Certainly we could afford to add every pair of the first 5 search fields (degree, signature, galois group, class number, class group), or even every triple. But this is overkill, since there is no reason to specify signature and degree, or class group and class number (assuming the query processing is smart about this -- currently setting degree=4 and signature=[2,1] seems to take about 10 times longer than just using signature=[2,1]. It might be worth having the query code drop degree whenever signature is set, and drop class number whenever class group is set.

AndrewVSutherland commented 6 years ago

I'm adding an index on degree and class number now as a test. The number fields database may be unavailable for a few minutes while it builds.

jwj61 commented 6 years ago

I am building lots of multiple indexes on beta (on the collection newfields -- I will rename it when it is done). It has been going for a couple of hours so far.

AndrewVSutherland commented 6 years ago

The index is up now on www.lmfdb.org, and searches that used to time out now take less than 400ms, for example: http://www.lmfdb.org/NumberField/?degree=2&class_number=4

Just let me know when everything is good to go on beta.

jwj61 commented 6 years ago

New indexes are now available on beta. It is not the prettiest, but here is the current index information (so you can let me know if you think something should be added):

[[(u'signature', 1), (u'galois', 1), (u'degree', 1)], [(u'signature', 1), (u'degree', 1)], [(u'ramps_all', 1), (u'degree', 1)], [(u'class_number', 1), (u'ramps_all', 1), (u'degree', 1)], [(u'class_number', 1), (u'class_group', 1), (u'degree', 1)], [(u'galois', 1), (u'ramps_all', 1), (u'degree', 1)], [(u'disc_abs_key', 1), (u'disc_sign', 1), (u'signature', 1), (u'degree', 1)], [(u'signature', 1), (u'ramps', 1), (u'degree', 1)], [(u'signature', 1), (u'class_group', 1), (u'degree', 1)], [(u'class_group', 1), (u'ramps', 1), (u'degree', 1)], [(u'galois', 1), (u'class_group', 1), (u'degree', 1)], [(u'_id', 1)], [(u'oldpolredabscoeffhash', 1)], [(u'galois', 1), (u'degree', 1)], [(u'galois', 1), (u'class_number', 1), (u'degree', 1)], [(u'degree', 1), (u'disc_abs_key', 1), (u'disc_sign', 1)], [(u'class_number', 1), (u'degree', 1)], [(u'signature', 1), (u'class_number', 1), (u'degree', 1)], [(u'coeffhash', 1)], [(u'class_number', 1), (u'ramps', 1), (u'degree', 1)], [(u'class_group', 1), (u'ramps_all', 1), (u'degree', 1)], [(u'class_group', 1), (u'degree', 1)], [(u'galois', 1), (u'ramps', 1), (u'degree', 1)], [(u'ramps', 1), (u'degree', 1)], [(u'signature', 1), (u'ramps_all', 1), (u'degree', 1)], [(u'label', 1)]]

Edit: Deleted index with both ramps and ramps_all

jwj61 commented 6 years ago

If the number field database is copied to the cloud, then can we consider this fixed?

I tried looking at log files for beta and did not see timing information, but maybe I missed it.

Since I made a copy of the number field database before making indeces and then renamed it back, I regenerated the fields.rand database as well.

AndrewVSutherland commented 6 years ago

I played around with this a bit on beta and things are definitely better (although still a lot slower than I think it should be). But we won't know for sure until we try it on the cloud, as the query performance between mmap and wired tiger can vary pretty dramatically.

I expect that copying this over to the cloud and building all these indexes will take several hours and will likely make the database unusable during that time, we should probably switch www.lmfdb.org to point to a mirror while we do this (I'll coordinate this with Edgar).

jwj61 commented 6 years ago

Thanks.

By the way, I did not doing anything about altering user queries for the moment since I think it is a little more complicated than it looks. Suppose a user (accidentally) enters incompatible class number and class group information. If we blindly drop the class number part of the query, then the results being returned are wrong (they should be empty).

To avoid that, we have to use the parsed version of the class number search entry and test against that first. Certainly not impossible, but it would probably add another layer of complexity to the parsing, since it would be better to get the information from the parsing code in an intermediate format, rather than the original string or the mongo query.

AndrewVSutherland commented 6 years ago

The web servers at www.lmfdb.org are now reading from the lmfdb0 replicaset and the copy is in progress (this does mean that any database changes on beta that have not been copied to the cloud will temporarily be visible on www.lmfdb.org, but this should be harmless). I'll let you know when it is done.

I agree with not altering the user queries for the moment . Let's see what this does first.

AndrewVSutherland commented 6 years ago

The new indices are now up on www.lmfdb.org. Definitely an improvement. Some searches are still pretty slow, (e.g. http://www.lmfdb.org/NumberField/?start=0&degree=8&class_group=%5B2%2C2%5D&count=20), but very few time out. I'm going to close this.