Open fingolfin opened 6 months ago
To stay with the QuimpGroup(4080,1)
example: in each entry, three groups are stored:
QUIMP_4080[1][1]
is the group itself;QUIMP_4080[1][3]
it the socleQUIMP_4080[1][4]
is... perhaps the group T
if the socle is T^k
? But I didn't see any references to this in the code.@DominikBernhardt is the format of the data files documented somewhere?
Anyway, in this specific example all three groups are the same. I think the socle should always be expressed in terms of the generators of the full group, perhaps via words in the generators. Doing so, I think this > 500kb entry could be shrunk by a factor 500.
It won't be as dramatic everywhere, but I am hopeful we can reduce by at least an order of magnitude.
For the socle, we can in fact just store (information about) a normal generating set, to be fed into NormalClosure
. If the socle is $T^k$ then often it will suffice to store generators for $T$.
Also, for the name
field, at least for the AS
cases, it seems the content is just what IsomorphismTypeInfoFiniteSimpleGroup
gives us. In that case I don't see a point in storing that, I'd just compute it on the fly.
Currently there are 5 compressed data files (
QUIMP[1-5].tar.bz2
) in the repository which take up 9-69 MB each for a total of 190 MB. The user has to extract them for a total of 770 MB.This should be reduced. Several ideas for this which can be combined.
First off, GAP can transparently access
.gz
files, this would suggest storing not e.g.lib/QUIMP_336.g
but ratherlib/QUIMP_336.g.gz
in the archives, so that disk space usage is reduced for the end user. The result is "only" 270 MBThis would in fact allow shipping the files "directly" to the user, without a need for .tar.bz2 files. These could then also be removed from the repository which would be better anyway; we could instead keep the
lib/QUIMP_*.g
files in the repository directly (and compress them on the fly for releases, which we already do for multiple other packages)Next, the content of the
lib/QUIMP_*.g
files could be optimized further.@aniemeyer suggest that for many groups a good way to compress them is to store them via generators in a different, minimal degree representation; and then store generators of a subgroup such that the coset action on the subgroup gives the actual QUIMP permutations. Indeed, take for example
QuimpGroup(4080,1)
. In the filelib/QUIMP_4080.g
it takes up more than 0.5 MB space. But it is $A_{17}$ in disguise. So one could replace the generators by the information "this is A17" plus generators for the point stabilizer: