JoshVarty / KaggleUtils

A collection of utilities I use for EDA, feature engineering etc.
MIT License
1 stars 1 forks source link

Cluster similar images #6

Open JoshVarty opened 5 years ago

JoshVarty commented 5 years ago

For image related tasks it might be useful to cluster similar images together. We may want to count these images across the train and test set and see whether or not the distributions are equal.

JoshVarty commented 5 years ago

Thoughts:

JoshVarty commented 5 years ago

It isn't working. Possibly because the features our network learns focus on "Does this image have cancer or not"

Possible workarounds

JoshVarty commented 5 years ago

Tried with more success on the input image. Things to try:

  1. Try without PCA
  2. Try with different number of PCA components
  3. Can we seed the K-means clusters?
    • I think we can use init. See here