Integrate the primitive groups of degree 4096 to 8191 into PrimGrp

jesselansdown commented 11 months ago

The primitive groups have been made available on Zenodo and additional properties computed to make them compatible with PrimGrp. The ability to load these groups have been added to PrimGrp and the other functions modified as needed to accommodate the new groups.

fingolfin commented 11 months ago

The archive on Zenodo is unnecessarily large. It contains many .gz files which are not actually compressed. At the very least, all PrimitiveGroups_6561_*.g.gz files are uncompressed. But there are more, e.g. PrimitiveGroups_5329_181.g.gz.

There is also a 5MB .DS_Store file.

Just fixing the above reduces the data size to 1.7GB (I used options -9 and -n for gzip; the latter removes time stamps and filenames, which saves a few more bytes but also helps with reproducibility of the data).

It is a bit annoying to clean this up with a script because there is a single directory containing ~29777 files. Various filesystems have trouble with such large directories. A natural way to avoid this would be to add subdirectories for each degree.

fingolfin commented 11 months ago

Turns out 20040 out of the whole 29776 are not actually compressed.

After uncompressing them all, the result actually only takes 3.0G on my disk according to du -h.

Recompressing with gzip -n produces a directory of size 1.3G (1,315,672 k -- though this will depend on the filesystem).

I'll try recompressing all files using zopfli (in gzip mode, so that it can still be read by GAP) which should save even more space, but is slow, so it'll take some time.

fingolfin commented 11 months ago

With zopfli it goes down to ~1.0G

jesselansdown commented 11 months ago

Hi Max, thanks for the suggested changes. I have accepted most of them already. There is one more that I need to address but it will require me to make some modifications. I'll do so shortly. Also, the files are supposed to be compressed... So I will make sure they are compressed and update the Zenodo repository once I have done so.

jesselansdown commented 11 months ago

I have made sure the files are properly compressed (using zopfli) and udated the Zenodo arxiv. It is now just over 1GB. I have addressed each of the other suggested edits. Is everything ok now?

codecov[bot] commented 11 months ago

Codecov Report

Merging #52 (699951e) into master (746f67a) will decrease coverage by 0.01%. The diff coverage is 90.18%.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## master #52 +/- ## ========================================== - Coverage 99.92% 99.91% -0.01% ========================================== Files 46 46 Lines 382928 383085 +157 ========================================== + Hits 382623 382764 +141 - Misses 305 321 +16 ``` | [Files](https://app.codecov.io/gh/gap-packages/primgrp/pull/52?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=gap-packages) | Coverage Δ | | |---|---|---| | [lib/primitiv.grp](https://app.codecov.io/gh/gap-packages/primgrp/pull/52?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=gap-packages#diff-bGliL3ByaW1pdGl2LmdycA==) | `100.00% <100.00%> (ø)` | | | [lib/primitiv.gi](https://app.codecov.io/gh/gap-packages/primgrp/pull/52?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=gap-packages#diff-bGliL3ByaW1pdGl2Lmdp) | `52.99% <44.82%> (-0.96%)` | :arrow_down: |

fingolfin commented 9 months ago

Thanks @jesselansdown I was busy elsewhere and forgot about this PR. It looks fine now!

On the long run I'd like to transition those data files to a new file format that maybe does not depend on parsing GAP code (i.e. so that other system could import it). Perhaps it would even be possibly to find a common format for old and new data. There are more things (making the data available with a finer granularity so that only parts which are needed can be fetched online on demand; perhaps even integrating this into a "real" DB, a website, etc.)

All of that is not meant to block this PR, just saying what I have in mind for future work.

jesselansdown commented 9 months ago

Thanks @jesselansdown I was busy elsewhere and forgot about this PR. It looks fine now!

On the long run I'd like to transition those data files to a new file format that maybe does not depend on parsing GAP code (i.e. so that other system could import it). Perhaps it would even be possibly to find a common format for old and new data. There are more things (making the data available with a finer granularity so that only parts which are needed can be fetched online on demand; perhaps even integrating this into a "real" DB, a website, etc.)

All of that is not meant to block this PR, just saying what I have in mind for future work.

No worries, glad everything is ok now! My main concern was to make the data available and compatible with the current library so that people can begin to use it already, but I agree that the data format could be improved in the future.

gap-packages / primgrp

Integrate the primitive groups of degree 4096 to 8191 into PrimGrp #52

Codecov Report