LMFDB / lmfdb

L-Functions and Modular Forms Database
Other
249 stars 200 forks source link

Ability to change sort order of search results #3109

Closed jwj61 closed 2 years ago

jwj61 commented 5 years ago

For CMF searches one can choose the sort order, see

http://www.lmfdb.org/ModularForm/GL2/Q/holomorphic/?search_type=List

It would be nice if the same option were available on other search results pages. We received a feedback page request to do this specifically for higher genus curves with automorphisms, but there are plenty of other candidates. Specific examples that have come up before

Note that unless the search results all fit on the first page, this is not a matter of simply sorting the results that appear on the screen (which could be done using javascript), it is reissuing the query with a different sort order -- this is both more useful and much easier to implement.

One other point worth noting, while it is standard UI practice to allow users to sort a column by clicking on it, we don't want to do that because it would make the knowls attached to the column headings inaccessible (and these are particularly important because column headings are often abbreviations or even just symbols whose meaning is not immediately obvious). This is why we used a drop down option list in the CMF implementation.

From the feedback page:

This is a feature request, not a problem. In the searchable database of curve automorphism groups, results are returned sorted by ascending dimension. Or maybe more accurately (I think, and depending on what is being searched for), results are returned sorted first by genus, then by dimension, then by group id, and then by signature.

It would be a great and very useful feature if search results could be toggled to be sorted by any one of the columns.

Thanks! Justin Lanier

Reply to jlanier8@gatech.edu

AndrewVSutherland commented 5 years ago

This is a duplicate of #2765, but given that there is more content here (and a request from the feedback page) I will close #2765.

jenpaulhus commented 4 years ago

This is implemented in Higher genus so could we get rid of that label on this issue, please?

AndrewVSutherland commented 4 years ago

Done!

AndrewVSutherland commented 2 years ago

Like the output column selection, this should be done generically across the entire LMFDB

roed314 commented 2 years ago

I've made progress on this, which is visible on purple. Suggestions on specific sort orders to include are welcome; I'm just making things up for areas of the LMFDB I'm less familiar with. I should be able to finish this today.

AndrewVSutherland commented 2 years ago

Can you give examples of some pages where the progress is visible?

roed314 commented 2 years ago

p-adic fields and above on the sidebar have a controllable sort order (well, number fields and p-adic fields will take 3 more minutes).

AndrewVSutherland commented 2 years ago

Ah, I was looking at genus 2 curves (which had a sort order option before). Did you make any changes to genus 2 curves? Also, what is the criteria for deciding when to provide asc/dec options?

The one comment I have so far is that I think the column being sorted on should be displayed be default.

AndrewVSutherland commented 2 years ago

Here are some suggestions for sort order options to add (or remove). One suggestion relevant to all cases is that there should always be an option that takes you back to the default sort order, whatever that is. If you go to HMFs, for example, none of the provided options are the default sort order, so once you choose one you have no way to get back to where you were without leaving the page. For sort options that involve labels, the sort should be on the tuple of components of the label (e.g. degree, real places, abs disc, tie breaker) for number fields, not sorting on the string. But feel free to ignore any suggestions that are hard to implement or likely to negatively impact performance. I've included some columns that have ranges that are so limited that it might not seem useful to sort on them (e.g. rank) but I have a feeling people will ask for them if we don't include them, and I can imagine scenarios where the search criteria are narrow enough to make it useful.

Note that I only included options that you did not already include (so for anyone else reading this, if there is an obvious sort option that I didn't list it is likely because David already included it).

JohnCremona commented 2 years ago

I agree in having a default which one can go back to -- though the default is surely based on multiple columns, not just one? Would it be very hard to have a secondary sort column to sort between tie breaks in the first? If easy, that would be nice.

EC/Q I enjoyed reverse-sorting by rank, showing what the max is (5) and all curves with that rank, and similarly with torsion order. To Drew's list I would add modular degree and #integral points. There's no harm in having sort options which look silly when applied to the entire list of curves (e.g. isogeny class size, with a max of 8), as it's more interesting if you have some list of curves with a 3-isogeny and then want to see the class sizes. So, if in doubt, add to the list.

EC/NF: call the sort order "conductor norm" not just conductor. Drew's list looks good.

Bianchi: looks OK. Seeing the "sign" column there made me realise that we don't have a sign (of functional equation) column for either ECQ or ECNF, and we probably should. Alternatively (and perhaps better) make the rank search box a drop-down menu showing all available ranks and also having odd and even as options. [I should make this a new issue.]

AndrewVSutherland commented 2 years ago

@JohnCremona My guess is that it is easy to add hard-wired sorted options that are as complicated as you like (this will often include the default sort order). Letting the users pick secondary keys seems hard without complicating the UI significantly.

roed314 commented 2 years ago

It's currently possible to go back to the default: it's the first option in all cases. But I only list the first column that is being sorted on, rather than all of them. In particular, the default is often the same as the label; is it better to display this as sorting by "label" or by "level" (with weight, character and hecke orbit implicit)?

I'll work on adding @AndrewVSutherland's suggestions.

roed314 commented 2 years ago

For CMFs, sorting by CM/RM is tough because those columns are lists of integers....

roed314 commented 2 years ago

For HMFs, I've relabeled "degree" as "label", and since the label of the base field is an initial segment of the label of the form, there's no point in adding a separate sort by base field.

roed314 commented 2 years ago

For ECQ, I sorting by Cremona label will be a bit wonky when there are more than 26 curves in the isogeny class, since the only database field is a string like 25200cv. Also, what column is the cyclic isogeny degree in? I don't see it in ec_curvedata.

roed314 commented 2 years ago

For ECNF, we encode CM vs potential CM using a sign on the cm column. Is it okay to just sort by that column, and if so, should it be in increasing or decreasing order?

roed314 commented 2 years ago

For G2C, what's the difference in sorting by label and by conductor, since the conductor is the first part of the label? Sato-Tate groups may be a bit weirdly ordered since it will be as a string, but I think that's fine.

AndrewVSutherland commented 2 years ago

Yes, sorting by genus 2 curve label will sort by conductor, so strictly speaking there is no need to sort by conductor, but maybe it makes sense to offer the option anyway? One could say the same for number fields and degree.

roed314 commented 2 years ago

I guess I don't know what the difference is. In every case we're sorting by more columns than are shown to the user. If I were to implement sorting by conductor, it would look precisely the same as sorting by label: in each case we're actually ordering by ["conductor", "class", "abs_disc", "disc_sign", "label"]. I don't think we ever actually want to sort by label as a string.

roed314 commented 2 years ago

If you think that conductor feels more natural to a user than label is, I'm happy with using conductor instead of label (in fact, that's what I did initially), but I don't think it make sense to have both.

AndrewVSutherland commented 2 years ago

Hmm, I'd be curious what others think. The same issues arises for number fields, where there is sort option for degree, which I guess is actually sorting by label if I understand you correctly? I'm not sure which is more intuitive, but I think we should be consistent and on each search results page either (1) always have a sort by label option (but no option to sort by the first part of the label), (2) always have sort by whatever is the first part of the label (which actually sorts by label) but no option to sort by label, or (3) provide both options (and have them do the same thing). It looks like you have gone with (2) in most places, so maybe it makes sense to do that everywhere for the moment and we can wait to get feedback from more people before finalizing things (or do a separate PR to change it if we decide to).

EDIT: of course I meant to write "sort" not "search" above (now changed).

roed314 commented 2 years ago

Sounds good. I'll go with (2) for now; it's easy to change later.

roed314 commented 2 years ago

For HGCWA, I just updated the "group order" sorts to also include the full group id, though it will be as a string (since the group column is text in that table).

roed314 commented 2 years ago

Related to the discussion above, the initial segment on the default sort of belyi is degree and group, and since the first part of the transitive id on the group is the degree, both are repeats of the label. I'm going to leave it just as "degree" for now, and we can revisit the issue.

roed314 commented 2 years ago

For ARTIN, in the case of projective image I've sorted first by Proj_nTj and then Proj_Polynomial (and then the standard list of columns).

roed314 commented 2 years ago

For Sato-Tate, we don't use search_wrap since we want to mix in the mu(n). This is kind of painful when changing the sort order. For now, I've always just left these first, but take a look and let me know how this should be modified. I think it's only broken for "component group" ordering....

roed314 commented 2 years ago

For abstract groups, the sort orders based on labels are going to be weird since they'll be sorted as strings (for example, 40.4 comes first when sorting by commutator since it has commutator C_10...). Eventually we can add appropriate columns with orders.

JohnCremona commented 2 years ago

For ECNF, we encode CM vs potential CM using a sign on the cm column. Is it okay to just sort by that column, and if so, should it be in increasing or decreasing order?

I would sort by the absolute value of the column (the sign just encodes whether the CM is potential or actual).

JohnCremona commented 2 years ago

For ECQ, I sorting by Cremona label will be a bit wonky when there are more than 26 curves in the isogeny class, since the only database field is a string like 25200cv. Also, what column is the cyclic isogeny degree in? I don't see it in ec_curvedata.

In elliptic_curves/elliptic_curve.py there is a 2-line utility function giving a sort key for Clabels. No need to reinvent!

JohnCremona commented 2 years ago

In most cases where labels are strings consisting of a sequence of integers separated by some punctuation we can, in each case, have a sort key which splits up the subfields and returns a list of integers. When there are letter codes another layer is needed as with Clabels, but that surely covers all types of label?

AndrewVSutherland commented 2 years ago

For Sato-Tate groups, we could just add all the mu/nu groups for n <= 10^6 and not show the Sato-Tate groups on dirichlet character pages of modulus above that if you think that would simplify matters. It would be less than 1GB data .

roed314 commented 2 years ago

For ECNF, we encode CM vs potential CM using a sign on the cm column. Is it okay to just sort by that column, and if so, should it be in increasing or decreasing order?

I would sort by the absolute value of the column (the sign just encodes whether the CM is potential or actual).

Currently, we can only sort by columns already in the data, so we can't apply any functions like absolute value (or utility functions on strings).

Postgres is capable of sorting using a value derived from the columns, but there are several issues with doing so for us (performance and the current capabilities of the backend interface).

roed314 commented 2 years ago

For Sato-Tate groups, we could just add all the mu/nu groups for n <= 10^6 and not show the Sato-Tate groups on dirichlet character pages of modulus above that if you think that would simplify matters. It would be less than 1GB data .

That would certainly simplify the code, and allow us to use search_wrap for Sato-Tate. Another intermediate option would be to just add mu(N) for N that are small enough that they can be mixed in with some sort order (the component group order comes to mind, but there are possibly others). When irrational ST-groups are ommitted, there will be no special case handling required; when they are, either the mus all come first (in which case we display the ones from the database and then endlessly include the others), or the extra mus all come at the end.

JohnCremona commented 2 years ago

For ECNF, we encode CM vs potential CM using a sign on the cm column. Is it okay to just sort by that column, and if so, should it be in increasing or decreasing order?

I would sort by the absolute value of the column (the sign just encodes whether the CM is potential or actual).

Currently, we can only sort by columns already in the data, so we can't apply any functions like absolute value (or utility functions on strings).

Postgres is capable of sorting using a value derived from the columns, but there are several issues with doing so for us (performance and the current capabilities of the backend interface).

OK I see. So if we really wanted to make sorting on labels possible we would need to add new columns for this purpose. (ec_curvedata did just this, in fact: the column iso_nlabel is the numerical equivalent of lmfdb_iso which is an alphabetic string: so you could sort these by lmfdb_label by using the key (conductor, iso_nlabel, lmfdb_number). But as far as I can see this use has been discarded, and the column iso_nlabel is no longer referred to anywhere in the code apart from one commented out line in backend/searchtable.py?)

roed314 commented 2 years ago

Yep, the easiest way to make correct sorting by label possible is to add numerical columns for each part of the label. As for iso_nlabel, it's definitely in use: it's part of the default sort order on ec_curvedata

sage: db.ec_curvedata._sort_orig                                                                                                           
['conductor', 'iso_nlabel', 'lmfdb_number']

and it's part of every sort in #4991.

roed314 commented 2 years ago

4991 has been merged, with some followup issues created.