datarobot / pic2vec

Lightweight Image Featurization Made Easy
Other
31 stars 14 forks source link

Support for multiple image columns #11

Closed joshloyal closed 7 years ago

joshloyal commented 7 years ago

This was a suggestion from @SYooma .

This may be beyond the scope of this project, but having away to support multiple image columns automatically would be useful. It could be a simple as running each column sequentially through the featurizer and then merging them together. They would all have to lie in the same directory to make the api sane.

At least having an example script of how to handle this case would be useful.

joristaglio commented 7 years ago

Done! The data is now stored in 5D tensors, where the first dimension indicates the image column. I.e. Data stored as: [image_column, image_batch, height, width, channel]. The featurized data will be stored the same way as before (but with extra features for each extra column), and with the columns appended in order of how they're passed into the network, and the features named with the image column in the title.

There will also be a column added that keeps track of if an image was missing or not. Because of biases, this will not always result in a zero vector, so I thought it was worth creating a column to explicitly track missing images– also worth noting that although missing images will not always result in a zero vector of features, it WILL always be the same across any missing image (it's just performing a prediction on a zero vector).

E.g. With 2 image columns, 2 photos (plus one missing from each column), and 2 features, it'll be stored like this:

imagecolumn1 imagecolumn2 imagecolumn1_missing imagecolumn1_feature_0 imagecolumn1_feature_1 imagecolumn2_missing imagecolumn2_feature_0 imagecolumn2_feature_1
random_image_11 random_image_21 0 0.3 0.4 0 -0.1 1.2
random_image_12 0 0.1 1.6 1 -1 -0.5
random_image_22 1 -1 -0.5 0 .4 .8

Thoughts on if I should switch how it's organized, and sort by features rather than image column? This seemed pretty natural but I'm open to the reverse.

joshloyal commented 7 years ago

Nice this is perfect. I think the order there is the most natural, so I would keep it. Good stuff!