Create pre-computed features of custom dataset

mhbassel commented 3 years ago

Hello everyone!

I am very interested in your project, and I was wondering please how did you create the pre-computed features you provided pre-computed 1000 instance features from ImageNet? I want to create something similar for my own dataset that contain only images.

Any help will be appreciated! Thanks

ArantxaCasanova commented 3 years ago

Hello, You can check out the instructions here. Instructions are provided in the subsection "How to subsample an instance feature dataset with k-means".

Let me know if that helps!

mhbassel commented 3 years ago

Hi @ArantxaCasanova,

thanks so much for your reply. I followed the instructions at the first and I used the script you mentioned, but when I try to do inference using inference/generate_images.py on the model I trained, I get an error when using the npy files I create. The error I get:

Traceback (most recent call last):
  File "inference/generate_images.py", line 346, in <module>
    main(config)
  File "inference/generate_images.py", line 158, in main
    z, all_feats, all_labels, all_img_paths = get_conditionings(
  File "inference/generate_images.py", line 90, in get_conditionings
    all_img_paths.append(data["image_path"][idx])
KeyError: 'image_path'

and after comparing them with the pre-computed feature files from the Repo, I found that my npy files have a dictionary with only one key center_examples but the ones you provided have different keys.

I am not sure what I am doing wrong really, I am new to these stuff :\

ArantxaCasanova commented 3 years ago

Yes, the script data_utils/store_kmeans_indexes.py saves the 1000 image indexes selected with K-means, not the features itself.

The files that I provide to be used with inference/generate_images.py contain a dictionary with 3 keys: "instance_features", and optionally "image_path" (if you want to visualize the image to which the instance features correspond to) and "labels" (only if you have labeled data). If you wanted to create similar dictionaries for your data, you would need to add some lines of code todata_utils/store_kmeans_indexes.py, for example in here that would look like:

data = dict()
data["instance_features"] = features[closest_sample]
np.save(path_to_save, data)

If you wanted to have some correspondence of instances with their image, and display those, you will need to get an ordered list of image paths for all images in the dataset in a variable called image_paths, for example, in the same order as they appear in features. Once you have that, you can also add this key to the dictionary as data["image_path"] = image_paths[closest_sample].

I also pushed a small change to fix the exact issue you encountered.

mhbassel commented 3 years ago

Worked like a charm! I followed your instructions about adding the instance_features and the labels. For labels I just loaded them in similar way to loading the features, but didn't order them and tested it, then I got an error:

Traceback (most recent call last):
  File "inference/generate_images.py", line 357, in <module>
    main(config)
  File "inference/generate_images.py", line 163, in main
    z, all_feats, all_labels, all_img_paths = get_conditionings(
  File "inference/generate_images.py", line 97, in get_conditionings
    torch.FloatTensor(data["instance_features"][idx : idx + 1]).repeat(
RuntimeError: Number of dimensions of repeat dims can not be smaller than number of dimensions of tensor

the data["instance_features"] was of shape (1000, 1, 2048), but removing the dimension in the middle solved it, not sure if this was the correct way to solve it, but it worked. therefore I will close the issue.

I really appreciate your help and thanks a lot (:

facebookresearch / ic_gan

Create pre-computed features of custom dataset #24