treatment of filenames in ADE is inconsistent

Problem: For ADE20k, our usual way of denoting an image through an image_id doesn't work. First, the images are inside of a nested structure, which cannot be predicted from the image id. E.g., ADE_train_00000994.jpg is to be found in training/a/abbey/. Second, the same image ID may occur in training/ and in testing/.

(Actually, I'm not so sure now anymore whether the imageID, which ultimately is coming from index_ade20k.mat, is the number in the filename. But I am fairly certain that the image_id that we used was non-unique w/o the split.)

So even knowing the split (which could be encoded into the image_id, by adding a constant number so that everything beyond that is from split B) is not enough to get the image; for that you need to know the category as well.

This problem surfaces at two places. During extraction, going through a dataframe like ade_objdf, as it is at the moment, is not enough, because that doesn't have the image category and the split. (Actually, it does have the split, as that is encoded into the image id.) So to get the image, one would need to load a different structure that goes from image id + split to the fully qualified path.

It is also a problem for our usual encoding of the image in the feature file, where we have only three numerical fields. This constraint makes it necessary to encode the split info (which minimally is needed to disambiguate the image_id) into the image id.

Possible solution:

Add the fields to the ADE dataframes, so that during extraction only the one dataframe needs to be consulted, in the same way as for all other corpora as well. (Rather than loading another dataframe with the mapping between id+split and full filename.)
Encode the split into the image id, in the feature file. Then, when one wants to go from feature row to the corresponding image, e.g. for visualisation of the image, one will need to have this mapping available. But that seems ok, since that is a special case and then the mapping dataframe can be explicitly loaded.

(But the API to get_image_filename should be cleaned up in any way, and split and category should be made into keyword arguments that are passed along to get_ade_filename.)

clp-research / clp-vision

treatment of filenames in ADE is inconsistent #12