MichalDanielDobrzanski / DeepLearningPython

neuralnetworksanddeeplearning.com integrated scripts for Python 3.5.2 and Theano with CUDA support
MIT License
2.79k stars 1.27k forks source link

Is there a way to write the output to the original ubyte format? #34

Closed AoifeMarieDoherty closed 3 years ago

AoifeMarieDoherty commented 3 years ago

Can this be converted back to the original format? Ideally exactly as in the original files (train labels, train data, test labels, test data), all in ubyte form?

MichalDanielDobrzanski commented 3 years ago

Hey Aoife, I have replied to your e-mail. Next time, please be more specific:

AoifeMarieDoherty commented 3 years ago

Oh I'm so sorry. Thank you for your reply.

The script I was wondering about is this script: https://github.com/MichalDanielDobrzanski/DeepLearningPython35/blob/master/expand_mnist.py

And my question specifically was:

Specifically, the output to look like this:

THE IDX FILE FORMAT the IDX file format is a simple format for vectors and multidimensional matrices of various numerical types. The basic format is

magic number size in dimension 0 size in dimension 1 size in dimension 2 ..... size in dimension N data

The magic number is an integer (MSB first). The first 2 bytes are always 0.

The third byte codes the type of the data: 0x08: unsigned byte 0x09: signed byte 0x0B: short (2 bytes) 0x0C: int (4 bytes) 0x0D: float (4 bytes) 0x0E: double (8 bytes)

The 4-th byte codes the number of dimensions of the vector/matrix: 1 for vectors, 2 for matrices....

The sizes in each dimension are 4-byte integers (MSB first, high endian, like in most non-Intel processors).

The data is stored like in a C array, i.e. the index in the last dimension changes the fastest.

and for the training data, and labels to be separate files, i.e. for the output to be exactly the above format, and the data divided into these files:

train-images-idx3-ubyte.gz: training set images train-labels-idx1-ubyte.gz: training set labels t10k-images-idx3-ubyte.gz: test set images t10k-labels-idx1-ubyte.gz: test set labels

And what I was wondering was is this possible to do with one of your already-written scripts or could you tell me how I would do it in python specifically, I'm not really familiar with MNIST/ubyte, so I'd really appreciate the help.

Thank you for your time