gap-packages / io

GAP package IO to do input and output
https://gap-packages.github.io/io/
Other
14 stars 14 forks source link

IO_Unpickle removes data from pickled file #112

Closed kiryph closed 2 years ago

kiryph commented 2 years ago

The Python pickle module can serialize data as following:

import pickle

# An arbitrary collection of objects supported by pickle.
data = {
    'a': [1, 2.0, 3+4j],
    'b': ("character string", b"byte string"),
    'c': {None, True, False}
}

with open('data.pickle', 'wb') as f:
    # Pickle the 'data' dictionary using the highest protocol available.
    pickle.dump(data, f, pickle.HIGHEST_PROTOCOL)
❯ ls -lh data.pickle
Permissions Size User   Date Modified Name
.rw-r--r--   130 kiryph  7 Sep 12:08  data.pickle

Now when I want to read in the data again, I can do this as following:

import pickle

with open('data.pickle', 'rb') as f:
    # The protocol version used is detected automatically, so we do not
    # have to specify it.
    data = pickle.load(f)

This leaves the file data.pickle untouched:

❯ ls -lh data.pickle
Permissions Size User   Date Modified Name
.rw-r--r--   130 kiryph  7 Sep 12:08  data.pickle

So I can do this again and again.

I would have expected that IO_Unpickle(f) from the gap package IO would behave similarly. However, when I unpickle gap data, the data is removed from the pickled file (despite f := IO_File("data.guck"); with the mode being "r" read-only).

I did not find a comment about this in the documentation:

As a workaround, I could duplicate the file before reading in. However, this increases the number of steps and is less convenient for interactive gap sessions, compared to gap> data1 := IO_Unpickle(IO_File("data.guck"));.

Could there be a flag that says "leave the pickle file unchanged"?

ChrisJefferson commented 2 years ago

Can you give a complete example of some GAP code that shows your problem? As I can't reproduce it. Here is me unpickling a file:


# Make the file
gap> o := IO_File("data.guck", "w");
<file fd=4 wbufsize=65536 wdata=0>
gap> IO_Pickle(o, [1,2,3,4,5]);
IO_OK
gap> IO_Close(o);
true

# Check what it looks like on disc
xyz/ $ ls -l
.rw-r--r-- caj caj 41 B Wed Sep  7 11:51:09 2022  data.guck
xyz/ $ md5sum data.guck
caaf33b1b5d823fabd2047cb6de816de  data.guck

# Load GAP back in and unpickle
xyz/ $ gap
gap> o := IO_File("data.guck", "r");
<file fd=4 rbufsize=65536 rpos=1 rdata=0>
gap> IO_Unpickle(o);
[ 1, 2, 3, 4, 5 ]
gap> IO_Close(o);
true

# Go back and check on disc, same md5sum
xyz/ $ ls -l
.rw-r--r-- caj caj 41 B Wed Sep  7 11:51:09 2022  data.guck
xyz/ $ md5sum data.guck
caaf33b1b5d823fabd2047cb6de816de  data.guck
kiryph commented 2 years ago

@ChrisJefferson Thanks for having a look into this. First of all, I am happy to hear that it should behave as I expected.

Unfortunately, I cannot recreate my issue anymore.

One thing I noticed is that I have sometimes mixed up IO_close() and IO_Close() (note the lowercase c and uppercase C).

For the record my GAP Object is a list of AffineCrystGroup from the package Cryst. This looks like:

 ┌───────┐   GAP 4.12.0 of 2022-08-18
 │  GAP  │   https://www.gap-system.org
 └───────┘   Architecture: x86_64-apple-darwin20-default64-kv8
 Configuration:  gmp 6.2.1, GASMAN, readline
 Loading the library and packages ...
 Packages:   AClib 1.3.2, Alnuth 3.2.1, AtlasRep 2.1.4, AutoDoc 2022.07.10, AutPGrp 1.11, Browse 1.8.14, CaratInterface 2.3.4,
             CRISP 1.4.5, crypting 0.10, Cryst 4.1.25, CrystCat 1.1.10, CTblLib 1.3.4, curlInterface 2.2.3, FactInt 1.6.3, FGA 1.4.0,
             Forms 1.2.8, GAPDoc 1.6.6, genss 1.6.7, IO 4.7.2, IRREDSOL 1.4.3, json 2.1.0, LAGUNA 3.9.5, Memoisation 1.0, orb 4.8.5,
             Polenta 1.3.10, Polycyclic 2.16, PrimGrp 3.4.2, RadiRoot 2.9, recog 1.3.2, ResClasses 4.7.3, SmallGrp 1.5, Sophus 1.27,
             SpinSym 1.5.2, TomLib 1.2.9, TransGrp 3.6.3, utils 0.76
 Try '??help' for help. See also '?copyright', '?cite' and '?authors'
gap> Read(Filename(dir, "LowIndexSpaceSubgroups.g"));
gap> G := SpaceGroupIT(2, 12);
gap> index := 8;
gap> L8 := LowIndexSubgroups(G, index);
[ SpaceGroupOnRightIT(2,12,'1'), <matrix group with 4 generators>, <matrix group with 4 generators>, <matrix group with 4 generators>,
  <matrix group with 3 generators>, <matrix group with 3 generators>, <matrix group with 4 generators>, <matrix group with 4 generators>,
  <matrix group with 4 generators>, <matrix group with 3 generators>, <matrix group with 4 generators>, <matrix group with 4 generators>,
  <matrix group with 4 generators>, Group([ [ [ 1, 0, 0 ], [ 0, 1, 0 ], [ 1, 0, 1 ] ], [ [ 1, 0, 0 ], [ 0, 1, 0 ], [ 0, 1, 1 ] ] ]),
  <matrix group with 3 generators>, <matrix group with 3 generators>, <matrix group with 3 generators>, <matrix group with 3 generators>,
  <matrix group with 4 generators>, <matrix group with 3 generators>, <matrix group with 3 generators>, <matrix group with 4 generators>,
  <matrix group with 4 generators>, <matrix group with 4 generators>, <matrix group with 4 generators>, <matrix group with 4 generators>
 ]
gap> time/1000.;
8.526

# Writing
gap> f := IO_File("SpaceSubgroupsIT_2_12_8.guck", "w");
gap> IO_Pickle(f, L8);
gap> IO_Close(f);

# Reading
gap> f := IO_File("SpaceSubgroupsIT_2_12_8.guck");
gap> L8r := IO_Unpickle(f);
[ SpaceGroupOnRightIT(2,12,'1'), <matrix group with 4 generators>, <matrix group with 4 generators>, <matrix group with 4 generators>,
  <matrix group with 3 generators>, <matrix group with 3 generators>, <matrix group with 4 generators>, <matrix group with 4 generators>,
  <matrix group with 4 generators>, <matrix group with 3 generators>, <matrix group with 4 generators>, <matrix group with 4 generators>,
  <matrix group with 4 generators>, Group([ [ [ 1, 0, 0 ], [ 0, 1, 0 ], [ 1, 0, 1 ] ], [ [ 1, 0, 0 ], [ 0, 1, 0 ], [ 0, 1, 1 ] ] ]),
  <matrix group with 3 generators>, <matrix group with 3 generators>, <matrix group with 3 generators>, <matrix group with 3 generators>,
  <matrix group with 4 generators>, <matrix group with 3 generators>, <matrix group with 3 generators>, <matrix group with 4 generators>,
  <matrix group with 4 generators>, <matrix group with 4 generators>, <matrix group with 4 generators>, <matrix group with 4 generators>
 ]
gap> IO_Close(f);
kiryph commented 2 years ago

@ChrisJefferson The python pickle module ensures that pickled data can always be restored in python across python versions and platforms, i.e. it is suitable for long-term storage.

Is the same true for IO_Pickle/IO_Unpickle?

ChrisJefferson commented 2 years ago

That is the intention, and up until now we haven't had to break that requirement. We don't however test between different GAP versions, so it could break.

I could believe an incorrect IO_close could break things, as not closing the file before reopening it might mean the file was not fully flushed to disc.