agethen / ConvLSTM-for-Caffe

28 stars 15 forks source link

how to generate the"data/train.txt" #4

Closed fenling closed 7 years ago

fenling commented 7 years ago

hi agethen: I have some difficult in generating the "data/train.txt" in "encode-decode.prototxt"。what does "data/train.txt"contain.and how to generate it . i already have many mnist data format .gif(64*64)。 i think train.txt maybe contains train1.h5 train2.h5 train3.h5. right or not ? thank you so much!

agethen commented 7 years ago

Dear fenling,

train.txt (and test.txt) simply are a list of filepaths to HDF5 files, separated by a newline . So an example could be: /tmp/file_0000.h5 /tmp/file_0001.h5 ...

Inside the HDF5 files, you save a dataset with a certain name. That name must be the same as the 'top' blob in the data layer! The data of the dataset should be of shape T x N x C x H x W (T timesteps, N batch items, C conv. input channel, H height, W width). Note how T and N are ordered: If you have T=3, N=2, then the data is ordered (xt,n): x0,0; x0,1; x1,0; x1,1; x2,0; x2,1

One way to build that file could for example be to use the 'h5py' package in python. Hope that helps!

agethen commented 7 years ago

So if you still want to use your gifs, you will have to extract them frame by frame.

However, I can also recommend the MovingMNIST generator that the authors of ConvLSTM supply. Check out their SPARNN package at http://home.cse.ust.hk/~xshiab/ The generator is located in their sparnn/helpers/movingmnist.py file.

The advantage is that their generator creates numpy files, that are straightforward to convert to the h5 format in python :)

fenling commented 7 years ago

Thank you for answering. but I still have three question. 1、the dataset shape T x N x C x H x W (C is 44 H 16 W 16) If have T=3, N=2, then the data is ordered (xt,n): x0,0,C0;x0,0,C1;...;x0,0,C15; x0,1,C0;x0,1,C1;...;x0,1,C15; x1,0,C0;x1,0,C1;...;x1,0,C15; x1,1,C0; x1,1,C1;... x1,1,C15; x2,0,C0;x2,0,C1;...;x2,0,C15; x2,1,C0; x2,1,C1;...; x2,1,C15; I understand not? 2、C0,C1,..C15,every size is HW? 3、N( batch items) has relationship with "batch_size" in encode-decode.prototxt ? hdf5_data_param { source: "data/test.txt" batch_size: 10 }

fenling commented 7 years ago

i have run the "movingmnist.py " and generate "moving-mnist-test.npz、moving-mnist-train.npz、moving-mnist-valid.npz“. can you help me, how to change the "moving-mnist-test.npz" to format /tmp/file_0000.h5 /tmp/file_0001.h5

agethen commented 7 years ago

You can probably google that ;) I am really a python beginner, but sth like in the attachments should work (assuming T=20):

import numpy
import h5py

# Load data
file = numpy.load( "moving-mnist-train.npz" )

# Select data field
data = file['input_raw_data']

# As you can see, the shape is N*T x C x H x W, so we need to change that
print data.shape

# Reshape to N x T x C x H x W
tmp = numpy.reshape( data, (data.shape[0]/20, 20, 1, 64, 64) )

# Swap T and N
res = numpy.swapaxes( tmp, 0, 1 )

for idx in range( 0, res.shape1] ):
  print "File ", idx

  # Pick the n-th item along N axis (while keeping shape)
  datum = res[:,idx:idx+1]

  # Open a file handle
  h5file = h5py.File( "file_" + str( idx ).zfill(5) + ".h5", 'w' )

  # Create a dataset with name "data"
  h5data = h5file.create_dataset( "data", shape = datum.shape, dtype = numpy.float32 )

  # Copy data
  h5data[:] = datum

  # Close file
  h5file.close()

print "Done!"

Also, careful with the batch_size parameter in Caffe. Caffe will only look at the T axis, not at N! You need to organize your data accordingly beforehand. So if you have T=10, and N=16, you write: batch_size: 10 and organize the data into 10 x 16 x C x H x W chunks!

fenling commented 7 years ago

hi agethen: thanks for giving me so much help.

ksnzh commented 7 years ago

Dear @agethen,

If my train set is not a time-related sequence, can I set T=1 and make it as input?

My dataset is just a collection of many pictures.

agethen commented 7 years ago

Dear @ksnzh I believe it should work (although I didnt test it).

The only potential issue I can think of would be the internal slice layers with a single output, and after I quick test, this does not seem to be a problem for Caffe!

Performance-wise, you might be better off by just building a classical net with convolutional, sigmoid + tanh layers though :)