Closed joshloyal closed 7 years ago
Done! The data is now stored in 5D tensors, where the first dimension indicates the image column. I.e. Data stored as: [image_column, image_batch, height, width, channel]. The featurized data will be stored the same way as before (but with extra features for each extra column), and with the columns appended in order of how they're passed into the network, and the features named with the image column in the title.
There will also be a column added that keeps track of if an image was missing or not. Because of biases, this will not always result in a zero vector, so I thought it was worth creating a column to explicitly track missing images– also worth noting that although missing images will not always result in a zero vector of features, it WILL always be the same across any missing image (it's just performing a prediction on a zero vector).
E.g. With 2 image columns, 2 photos (plus one missing from each column), and 2 features, it'll be stored like this:
imagecolumn1 | imagecolumn2 | imagecolumn1_missing | imagecolumn1_feature_0 | imagecolumn1_feature_1 | imagecolumn2_missing | imagecolumn2_feature_0 | imagecolumn2_feature_1 |
---|---|---|---|---|---|---|---|
random_image_11 | random_image_21 | 0 | 0.3 | 0.4 | 0 | -0.1 | 1.2 |
random_image_12 | 0 | 0.1 | 1.6 | 1 | -1 | -0.5 | |
random_image_22 | 1 | -1 | -0.5 | 0 | .4 | .8 |
Thoughts on if I should switch how it's organized, and sort by features rather than image column? This seemed pretty natural but I'm open to the reverse.
Nice this is perfect. I think the order there is the most natural, so I would keep it. Good stuff!
This was a suggestion from @SYooma .
This may be beyond the scope of this project, but having away to support multiple image columns automatically would be useful. It could be a simple as running each column sequentially through the featurizer and then merging them together. They would all have to lie in the same directory to make the api sane.
At least having an example script of how to handle this case would be useful.