Open roed314 opened 2 years ago
I'm going to add a column st_label_components
to g2c_curves
of type integer[]
that is a list of the 6 integers that make up the Sato-Tate group label. Question for @roed314: would it make sense to also add a label_components
column to the gps_st
table? Currently we sort with an index on the six columns that make up the components, but we could instead sort on a single column. The same question applies to all of the object tables -- should any table that has a label
column also have a label_components
column of type integer[]
(or numeric[]
if there is any possibility of values greater than 2^31, as with number fields and artin reps). Conceivably this could save on indexes, but if might not because we may also be using the same index for queries that involve only some of the columns in the label...
I'm not 100% sure what is more efficient. I suspect that it's better to keep them as separate columns when they might occur in search queries, since otherwise postgres doesn't know that they're connected (unless you use more advanced statistics for the query planner than we have enabled). Of course, for gps_st
there are no efficiency concerns since it's so small.
In general, I think a lot of the component columns already exist since the labels have mathematically meaningful parts. The issues usually arise with the "tiebreaker" parts of the labels. I think my inclination would be to make the changes as minimally intrusive as possible, and just add numerical versions of these tiebreaker parts.
@roed314 But for tables that are using the labels as references (e.g. g2c_curves referring to a Sato-Tate group) surely we don't want a column for each component of the ST group label, I think an array is better.
For the label column itself, I'm not immediately convinced that it is less invasive to add numeric tie breaker columns. For example, in the gps_st table I would need to add two columns (one for the identity component letter and another for the tie breaker at the end), or I could just add one column whose values I need to compute anyway. None of these columns is going to be used by the code for anything other than sorting, and its less code to just specify a single sort key than a compound one. Am I missing something here?
I've added the column st_label_components
to g2c_curves
.
I agree: for columns referring to an external label, just having an array is fine.
There's some data duplication (which is negligible in most cases). I think if there is only one part that needs to be numeric, I'd go with a scalar column, but in the gps_st
table where there are two I'm fine with an array column.
I'm using the st_label_components
for the sort in genus 2 curves now. But I discovered that the Sato-Tate knowls are broken: click on one here.
This must have broken when I was addressing a merge conflict a few days ago, fixed in #5016 (the change is just two characters)
I've added a label_components
column to gps_st0
Is this something we are still hoping to do with finite groups?
Labels are stored as text, but we'd like them to be sorted numerically (lexicographically). There's probably some fancy thing we could do with custom PostgreSQL types, but the simpler solution is to just add some more columns with the numerical components. The following sort orders were suggested by @AndrewVSutherland in #3109, and were initially added in #4991 but were disabled because they sorted incorrectly.
Here's a list of text columns that we would need to split up into numerical columns (or possibly single columns consisting of a list of integers) in order to have proper sorting.
artin_reps
:Container
belyi_galmaps_fixed
:base_field_label
ec_curvedata
:Ciso
g2c_curves
:st_label
gps_groups
:center_label
,commutator_label
,central_quotient
,abelian_quotient
gps_st
:st0_label
There are also several cases where we have enough parts of the label that it sorts sensibly, but the text label is used as a tiebreaker. These mostly look okay.