I am trying to understand what is going on in the cpm_data_transformer. I've found that each image is stored in lmdb as a data structure: 6 x width x height - 3 channels, metadata, miss mask and all mask.
Why is metadata encoded in such a big buffer? Is it for performance reasons?
Did you try to generate augmented dataset with just images and labels (vec and heat) before training ?
My wild guess is they where writing custom caffe layer and were limited in types of data layer could receive, so the encoded everything in additional image layer.
I am trying to understand what is going on in the cpm_data_transformer. I've found that each image is stored in lmdb as a data structure: 6 x width x height - 3 channels, metadata, miss mask and all mask. Why is metadata encoded in such a big buffer? Is it for performance reasons? Did you try to generate augmented dataset with just images and labels (vec and heat) before training ?