Closed AoifeMarieDoherty closed 3 years ago
Hey Aoife, I have replied to your e-mail. Next time, please be more specific:
Oh I'm so sorry. Thank you for your reply.
The script I was wondering about is this script: https://github.com/MichalDanielDobrzanski/DeepLearningPython35/blob/master/expand_mnist.py
And my question specifically was:
I can see the output is a pickle.gz.
I would like to get the output exactly the same as what you originally download from MNIST here ; i.e. idx, ubyte data.
Specifically, the output to look like this:
THE IDX FILE FORMAT the IDX file format is a simple format for vectors and multidimensional matrices of various numerical types. The basic format is
magic number size in dimension 0 size in dimension 1 size in dimension 2 ..... size in dimension N data
The magic number is an integer (MSB first). The first 2 bytes are always 0.
The third byte codes the type of the data: 0x08: unsigned byte 0x09: signed byte 0x0B: short (2 bytes) 0x0C: int (4 bytes) 0x0D: float (4 bytes) 0x0E: double (8 bytes)
The 4-th byte codes the number of dimensions of the vector/matrix: 1 for vectors, 2 for matrices....
The sizes in each dimension are 4-byte integers (MSB first, high endian, like in most non-Intel processors).
The data is stored like in a C array, i.e. the index in the last dimension changes the fastest.
and for the training data, and labels to be separate files, i.e. for the output to be exactly the above format, and the data divided into these files:
train-images-idx3-ubyte.gz: training set images train-labels-idx1-ubyte.gz: training set labels t10k-images-idx3-ubyte.gz: test set images t10k-labels-idx1-ubyte.gz: test set labels
And what I was wondering was is this possible to do with one of your already-written scripts or could you tell me how I would do it in python specifically, I'm not really familiar with MNIST/ubyte, so I'd really appreciate the help.
Thank you for your time
Can this be converted back to the original format? Ideally exactly as in the original files (train labels, train data, test labels, test data), all in ubyte form?