LMFDB / lmfdb

L-Functions and Modular Forms Database
Other
252 stars 200 forks source link

Add columns to support sort ordering on label columns #5013

Open roed314 opened 2 years ago

roed314 commented 2 years ago

Labels are stored as text, but we'd like them to be sorted numerically (lexicographically). There's probably some fancy thing we could do with custom PostgreSQL types, but the simpler solution is to just add some more columns with the numerical components. The following sort orders were suggested by @AndrewVSutherland in #3109, and were initially added in #4991 but were disabled because they sorted incorrectly.

Here's a list of text columns that we would need to split up into numerical columns (or possibly single columns consisting of a list of integers) in order to have proper sorting.

There are also several cases where we have enough parts of the label that it sorts sensibly, but the text label is used as a tiebreaker. These mostly look okay.

AndrewVSutherland commented 2 years ago

I'm going to add a column st_label_components to g2c_curves of type integer[] that is a list of the 6 integers that make up the Sato-Tate group label. Question for @roed314: would it make sense to also add a label_components column to the gps_st table? Currently we sort with an index on the six columns that make up the components, but we could instead sort on a single column. The same question applies to all of the object tables -- should any table that has a label column also have a label_components column of type integer[] (or numeric[] if there is any possibility of values greater than 2^31, as with number fields and artin reps). Conceivably this could save on indexes, but if might not because we may also be using the same index for queries that involve only some of the columns in the label...

roed314 commented 2 years ago

I'm not 100% sure what is more efficient. I suspect that it's better to keep them as separate columns when they might occur in search queries, since otherwise postgres doesn't know that they're connected (unless you use more advanced statistics for the query planner than we have enabled). Of course, for gps_st there are no efficiency concerns since it's so small.

In general, I think a lot of the component columns already exist since the labels have mathematically meaningful parts. The issues usually arise with the "tiebreaker" parts of the labels. I think my inclination would be to make the changes as minimally intrusive as possible, and just add numerical versions of these tiebreaker parts.

AndrewVSutherland commented 2 years ago

@roed314 But for tables that are using the labels as references (e.g. g2c_curves referring to a Sato-Tate group) surely we don't want a column for each component of the ST group label, I think an array is better.

For the label column itself, I'm not immediately convinced that it is less invasive to add numeric tie breaker columns. For example, in the gps_st table I would need to add two columns (one for the identity component letter and another for the tie breaker at the end), or I could just add one column whose values I need to compute anyway. None of these columns is going to be used by the code for anything other than sorting, and its less code to just specify a single sort key than a compound one. Am I missing something here?

AndrewVSutherland commented 2 years ago

I've added the column st_label_components to g2c_curves.

roed314 commented 2 years ago

I agree: for columns referring to an external label, just having an array is fine.

There's some data duplication (which is negligible in most cases). I think if there is only one part that needs to be numeric, I'd go with a scalar column, but in the gps_st table where there are two I'm fine with an array column.

roed314 commented 2 years ago

I'm using the st_label_components for the sort in genus 2 curves now. But I discovered that the Sato-Tate knowls are broken: click on one here.

AndrewVSutherland commented 2 years ago

This must have broken when I was addressing a merge conflict a few days ago, fixed in #5016 (the change is just two characters)

AndrewVSutherland commented 2 years ago

I've added a label_components column to gps_st0

jenpaulhus commented 1 year ago

Is this something we are still hoping to do with finite groups?