Compress data files further

The data files bundled in primgrp take up 39 MB. The new data files for degrees up to 8191 take up ~1GB while .gz compressed.

I think we can reduce this by quite a bit, perhaps even by an order of magnitude or two, by changing how we store the groups: what takes up a lot of space is to write down permutations of degree, say, 5000 -- they are long.

But most of these groups admit faithful permutation representations of much smaller degree. So one could instead store generators for such a small degree representation. To recover the perm rep we actually want, we just need to encode a point stabilizer, by describing generators for it as words in the small degree generator.

I'll illustrate this with an example:

gap> G:=PrimitiveGroup(3600,2);
<permutation group of size 1296000 with 2 generators>
gap> Length(GeneratorsOfGroup(G));
2
gap> Length(String(G));
34503

So storing this group by printing its generators (as we currently do) it takes ~34 kilobytes.

gap> iso:=SmallerDegreePermutationRepresentation(G);
<action epimorphism>
gap> NrMovedPoints(Image(iso));
30
gap> H:=Image(iso);;
gap> Length(String(H));
192

So the group has a degree 30 perm rep which can be described in just 192 bytes. But we also need a point stabilizer:

gap> S:=Image(iso,Stabilizer(G,1));;
gap> Length(String(S));
277

So that's another 277 bytes, for a total of 469, or 1.4% of the original size; a reducation by a factor 73. But we can do better by expressing S via words:

gap> hom:=EpimorphismFromFreeGroup(H:names:=["x","y"]);;
gap> PreImagesRepresentative(hom, S.1);
x^-4*y^-1*x*y^3*x^-1*y^-1*x^4*y^-2*(x*y)^2*y*x*y^-2*x*y*x^-1*y^-1
gap> PreImagesRepresentative(hom, S.2);
x^-2*(x^-1*y)^2*x*y^-1*x^-1*y^-4*x*y*x^3*y^-2*(x*y)^2*y*x*y^-2*x^-1*y
gap> PreImagesRepresentative(hom, S.3);
x^5*y*x^-1*y*x*y^-1*x^-1*y^-4*x*y*x^3*y^-2*(x*y)^2*y^3*x^-1*y^-1*x^-1
gap> Length(String(List(GeneratorsOfGroup(S), x -> PreImagesRepresentative(hom, x))));
211

But those words can be encoded much better... We probably have a function for that already int he meantime, but it's also not hard to code one, e.g. this:

EncodeWord:=function(w)
  local genexp, i, res, c, e;
  genexp := ExtRepOfObj(w);
  res:="";
  for i in [1, 3 .. Length(genexp)-1] do
    c := CharInt(IntChar('a') + genexp[i] - 1);
    e := genexp[i+1];
    if e < 0 then
      c := UppercaseChar(c);
      e := -e;
    fi;
    Append(res, ListWithIdenticalEntries(e, c));
  od;
  return res;
end;

And then:

gap> EncodeWord(PreImagesRepresentative(hom, S.1));
"AAAABabbbABaaaaBBababbaBBabAB"
gap> Length(String(List(GeneratorsOfGroup(S), x -> EncodeWord(PreImagesRepresentative(hom, x)))));
105

And with that we save a factor $34503 / (192+105) \approx 116$.

That said, such dramatic savings are certainly not always possible. E.g. for PrimitiveGroup(4095, 1) I only get down to a rep of degree 2016, which is still too big.

gap> G:=PrimitiveGroup(4095, 1);
<permutation group of size 208114637736580743168000 with 2 generators>
gap> Length(String(GeneratorsOfGroup(G)));
38923

# for the next one, it takes a while and I had to repeat it to get even to 2016
gap> iso:=SmallerDegreePermutationRepresentation(G);
<action epimorphism>
gap> H:=Image(iso);
gap> NrMovedPoints(H);
2016
gap> Length(String(GeneratorsOfGroup(H)));
18039

So that "only" saves a factor ~2

But this group is actually well-known, it is SO(13,2):

gap> IsSimpleGroup(G);
true
gap> IsomorphismTypeInfoFiniteSimpleGroup(G);
rec( name := "B(6,2) = O(13,2) ~ C(6,2) = S(12,2)", parameter := [ 6, 2 ],
  series := "B", shortname := "S12(2)" )

This is prime candidate for group recognition! Alas for now we can just brute force (slowly) an isomorphism:

gap> iso:=IsomorphismGroups(SO(13,2), H);; time;
7927

With that we can find words for the generators of a point stabilizer. The group itself can be stored as SO(13,2) so with just 8 bytes...

I omitted details about how to go the other way: of course that's done by computing the action of the "compressed" group H on cosets of the (image of the) point stabilizer S. This is not unique and potentially a bottleneck when decompressing.

Both issues (uniqueness and speed) could be mitigated by storing a transversal (as words in the generators of the full group). Or we ignore it / rely on the GAP algorithm to be consistent and fast enough.

So the simplest case is to just use FactorCosetAction(H,S).

Here is a full example (showing that it can be worthwhile to search multiple times for smaller degree reps...)

gap> G:=PrimitiveGroup(3600,2);
<permutation group of size 1296000 with 2 generators>
gap> iso:=IdentityMapping(G);; for i in [1..5] do iso:=iso*SmallerDegreePermutationRepresentation(Image(iso)); od; H:=Image(iso);; NrMovedPoints(H);
15
gap> S:=Image(iso,Stabilizer(G,1));;
gap> S:=Group(MinimalGeneratingSet(S));
Group([ (1,4,2)(3,5,6)(7,13,12), (1,12,6)(2,11,10,4,13,8,15,7,5,14,9,3) ])
gap> hom:=EpimorphismFromFreeGroup(H:names:=["x","y"]);;
gap> Length(String(H));
90
gap> Length(String(List(GeneratorsOfGroup(S), x -> PreImagesRepresentative(hom, x))));
128

To go back:

gap> G0:=Image(FactorCosetAction(H,S)); NrMovedPoints(G0);
<permutation group of size 1296000 with 2 generators>
3600
gap> PrimitiveIdentification(G0);
2

However the restored group is not equal to the original (though they are conjugate in the symmetric group):

gap> G0 = G;
false

Regarding finding minimal words: one can use the function Factorization to find shorter words (it is much slower but for compression that shouldn't matter as much).

gap> Length(String(List(GeneratorsOfGroup(S), x -> Factorization(H, x))));
68

(with PreImagesRepresentative it was 128).

gap-packages / primgrp

Compress data files further #56