Bins in FDS and LDS - not usable in general approach, only for given datasets

5uperpalo commented 2 years ago

Hi Team, I liked the ideas in your paper, but from reading the paper and provided code it sounds like the provided FDS and LDS code can be applied to any dataset/model? Is it really true?

It looks like you are using only integers(as you are predicting age) to make a dictionary of histogram bins in both FDS and LDS. In the paper you say : "We use a minimum bin size of 1, i.e., yb+1 − yb = 1, and group features with the same target value in the same bin." I imagine this makes a lot of things easier but if you are facing imbalanced regression problem and your labels are float between 0 and 5 this version of code won't help you. Do you by any chance have code with general approach?
- see following parts of code with usecase specific histogram bins:
  - https://github.com/YyzHarry/imbalanced-regression/blob/055a7b3804bbaf903ed25a55c11ab8acc6e142e1/agedb-dir/datasets.py#L60
  - https://github.com/YyzHarry/imbalanced-regression/blob/055a7b3804bbaf903ed25a55c11ab8acc6e142e1/agedb-dir/fds.py#L120
I did not find an explanation for this clipping(maybe empirically it gave better results?): https://github.com/YyzHarry/imbalanced-regression/blob/055a7b3804bbaf903ed25a55c11ab8acc6e142e1/agedb-dir/datasets.py#L67
there is also another clipping here(I guess again better empirical results?): https://github.com/YyzHarry/imbalanced-regression/blob/055a7b3804bbaf903ed25a55c11ab8acc6e142e1/agedb-dir/utils.py#L102

Note: I like the ideas in the paper, but due to lack of documentation/explanation I am right now spending a lot of time on generalizing the code and trying to figure out why you made some of the operations(eg. clippings)

5uperpalo commented 2 years ago

another issue by dividing kernel window with max value, eg. here: https://github.com/YyzHarry/imbalanced-regression/blob/055a7b3804bbaf903ed25a55c11ab8acc6e142e1/agedb-dir/fds.py#L44 you are changing the mean and variance values and not only smoothing them along axis (either features or LDS). Do I understand it correctly?

YyzHarry commented 2 years ago

Hi - thanks for your interest.

It looks like you are using only integers(as you are predicting age) to make a dictionary of histogram bins in both FDS and LDS.

This is not true. The code you referred to is for age prediction, which is a specific case where the minimum resolution we care is 1, thus the bin size is set to 1. However, if you read the paper carefully, there is no constrain on bin size --- in fact, for some datasets we experimented on, e.g., STS-B-DIR or NYUD2-DIR, the labels are exactly float numbers (e.g., [0, 5] for STS-B-DIR). The bin size for these datasets are 0.1 in the experiments. You might want to refer to the code for sts-b-dir and nyud2-dir.

That being said, FDS and LDS could be applied to any dataset / deep model, as long as you define the minimum bin size you care about.

I did not find an explanation for this clipping

The reason to clip the weights here is because after inverse re-weighting, some weights might be very large (e.g., in age estimation, consider 5,000 images for age 30, and only 1 image for age 100, then after inverse re-weighting, the weight ratio could be extremely high). This could cause optimization problems.

there is also another clipping here

Similarly, the clipping here is for numerical stability of FDS. If some bins contain a very small number of samples, the variance estimation may not be stable. To avoid optimization problems, we simply use clipping here.

by dividing kernel window with max value, you are changing the mean and variance values and not only smoothing them along axis (either features or LDS)

I do not quite understand your question here. This is just one implementation choice. You could also use gaussian_filter1d for implementation to simulate a kernel window.

5uperpalo commented 2 years ago

thank you for your answer regarding the clipping! Let me rephrase the other questions:

The ideas are generally applicable, but the provided code is specific for each use. In order to use it in general approach or with other datasets I have to put it together myself, am I correct? e.g. I was able to find that the lds bin size for this dataset is specified here: https://github.com/YyzHarry/imbalanced-regression/blob/055a7b3804bbaf903ed25a55c11ab8acc6e142e1/sts-b-dir/tasks.py#L51

if you use gaussian_filter1d/max(gaussian_filter1d) - as you are in the FDS, then after convolution the feature values are not only smoothed but also their mean value increases, is there any reason for this? eg. try following example:

x = np.random.rand(10)
ks = 5
sigma = 2
half_ks = (ks - 1) // 2
base_kernel = [0.] * half_ks + [1.] + [0.] * half_ks
kernel_window_withmax = gaussian_filter1d(base_kernel, sigma=sigma) / max(gaussian_filter1d(base_kernel, sigma=sigma))
kernel_window = gaussian_filter1d(base_kernel, sigma=sigma)
x_k_withmax = convolve1d(x, kernel_window_withmax)
x_k = convolve1d(x, kernel_window)
plt.plot(x_k_withmax, label="x_k_withmax")
plt.plot(x_k, label="x_k")
plt.plot(x, label="x")
plt.legend()

kaiwenzha commented 2 years ago

Thanks for your explanation, we now understand your questions better.

thank you for your answer regarding the clipping! Let me rephrase the other questions:

The ideas are generally applicable, but the provided code is specific for each use. In order to use it in general approach or with other datasets I have to put it together myself, am I correct? e.g. I was able to find that the lds bin size for this dataset is specified here: https://github.com/YyzHarry/imbalanced-regression/blob/055a7b3804bbaf903ed25a55c11ab8acc6e142e1/sts-b-dir/tasks.py#L51

Yes, your understanding is correct. For every dataset, we implemented a get_bin_idx() function to return the bin index of the regression label. For this function, we assume the label range for each dataset is known and define the number of bins we want to use, then the bin size is naturally defined (yes, you can also infer the bin size from this function). Besides, we also list those settings (e.g., label range, bin size) in detail in Table 7 (Appendix B) of our paper.

if you use gaussian_filter1d/max(gaussian_filter1d) - as you are in the FDS, then after convolution the feature values are not only smoothed but also their mean value increases, is there any reason for this? eg. try following example:
x = np.random.rand(10)
ks = 5
sigma = 2
half_ks = (ks - 1) // 2
base_kernel = [0.] * half_ks + [1.] + [0.] * half_ks
kernel_window_withmax = gaussian_filter1d(base_kernel, sigma=sigma) / max(gaussian_filter1d(base_kernel, sigma=sigma))
kernel_window = gaussian_filter1d(base_kernel, sigma=sigma)
x_k_withmax = convolve1d(x, kernel_window_withmax)
x_k = convolve1d(x, kernel_window)
plt.plot(x_k_withmax, label="x_k_withmax")
plt.plot(x_k, label="x_k")
plt.plot(x, label="x")
plt.legend()

In LDS, we use gaussian_filter1d / max(gaussian_filter1d), but in FDS, we use gaussian_filter1d / sum(gaussian_filter1d) to ensure that the mean value scale will not be changed, which is effectively equal to gaussian_filter1d.

https://github.com/YyzHarry/imbalanced-regression/blob/055a7b3804bbaf903ed25a55c11ab8acc6e142e1/agedb-dir/fds.py#L44

But to align with other kernels (e.g., triangle) that indeed require a sum normalization, we also add /sum to gaussian kernel.

5uperpalo commented 2 years ago

thank you both @kaiwenzha and @YyzHarry ! closing he the issue as everything is clear to me now :)

YyzHarry / imbalanced-regression

Bins in FDS and LDS - not usable in general approach, only for given datasets #15