YyzHarry / imbalanced-regression

[ICML 2021, Long Talk] Delving into Deep Imbalanced Regression
http://dir.csail.mit.edu
MIT License
806 stars 128 forks source link

feature smoothing when doing backward propogation #6

Closed TerenceChen95 closed 2 years ago

TerenceChen95 commented 3 years ago

Hi, thanks for sharing this wonderful project! There's a small question I wanna ask for. I got the following error when applying the FDS module:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [64]], which is output 0 of SelectBackward, is at version 7; expected version 3 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

When I add feature.detach(), the error was gone, but if it is correct to do so? From my understanding, this module is updating feature with previous means and vairances, does it affect the BP part? Thanks in advance for any help!

kaiwenzha commented 3 years ago

Hi @TerenceChen95, thanks for your interest!

I guess you were trying to apply FDS to your own model/data since our code will not raise this error. Would you mind sharing the specific code snippet related to the issue so that we can better help you. It seems that the problem was caused by the inplace operation when smoothing the features.

carlotapares commented 2 years ago

Hi @kaiwenzha, I am also getting the same issue. I attach the code snippet that is causing it.

def smooth(self, features, labels, epoch):
        if epoch < self.start_smooth:
            return features
        labels = labels.squeeze(1)
        buckets = np.array([self._get_bucket_idx(label) for label in labels])
        for bucket in np.unique(buckets):
            features[buckets == bucket] = calibrate_mean_var(
                features[buckets == bucket],
                self.running_mean_last_epoch[bucket],
                self.running_var_last_epoch[bucket],
                self.smoothed_mean_last_epoch[bucket],
                self.smoothed_var_last_epoch[bucket]
            )
        return features

The assignment to an existing tensor features[buckets == bucket] = raises the error. Thanks in advance for your help!

907491795 commented 11 months ago

Hi @carlotapares ,I am also getting the same issue. Have you solved this problem?

ChengkaiYang commented 2 months ago

Hi @kaiwenzha, I am also getting the same issue. I attach the code snippet that is causing it.

def smooth(self, features, labels, epoch):
        if epoch < self.start_smooth:
            return features
        labels = labels.squeeze(1)
        buckets = np.array([self._get_bucket_idx(label) for label in labels])
        for bucket in np.unique(buckets):
            features[buckets == bucket] = calibrate_mean_var(
                features[buckets == bucket],
                self.running_mean_last_epoch[bucket],
                self.running_var_last_epoch[bucket],
                self.smoothed_mean_last_epoch[bucket],
                self.smoothed_var_last_epoch[bucket]
            )
        return features

The assignment to an existing tensor features[buckets == bucket] = raises the error. Thanks in advance for your help!

You can solve the problem by using torch.masked_scatter function and will get no longer inplace operation. I think the inplace latent problem is here: feature[labels == label] = self.carlibarate(feature[labels==label]) Maybe we must use torch.masked_scatter function to fix the problem like following: features = torch.masked_scatter(features,(labels == label).unsqueeze(1).repeat(1,features.shape[1]),self.calibrate_mean_var( features[labels == label], self.running_mean_last_epoch[int(label - self.bucket_start)], self.running_var_last_epoch[int(label - self.bucket_start)], self.smoothed_mean_last_epoch[int(label - self.bucket_start)], self.smoothed_var_last_epoch[int(label - self.bucket_start)]))

ChengkaiYang commented 2 months ago

You can solve the problem by using torch.masked_scatter function and will get no longer inplace operation.

You can solve the problem by using torch.masked_scatter function and will get no longer inplace operation. I think the inplace latent problem is here: feature[labels == label] = self.carlibarate(feature[labels==label]) Maybe we must use torch.masked_scatter function to fix the problem like following: features = torch.masked_scatter(features,(labels == label).unsqueeze(1).repeat(1,features.shape[1]),self.calibrate_mean_var( features[labels == label], self.running_mean_last_epoch[int(label - self.bucket_start)], self.running_var_last_epoch[int(label - self.bucket_start)], self.smoothed_mean_last_epoch[int(label - self.bucket_start)], self.smoothed_var_last_epoch[int(label - self.bucket_start)]))

ChengkaiYang commented 2 months ago

Hi @TerenceChen95, thanks for your interest!

I guess you were trying to apply FDS to your own model/data since our code will not raise this error. Would you mind sharing the specific code snippet related to the issue so that we can better help you. It seems that the problem was caused by the inplace operation when smoothing the features.

Yes, I think the inplace latent problem is here: feature[labels == label] = self.carlibarate(feature[labels==label]) Maybe we must use torch.masked_scatter function to fix the problem like following: features = torch.masked_scatter(features,(labels == label).unsqueeze(1).repeat(1,features.shape[1]),self.calibrate_mean_var( features[labels == label], self.running_mean_last_epoch[int(label - self.bucket_start)], self.running_var_last_epoch[int(label - self.bucket_start)], self.smoothed_mean_last_epoch[int(label - self.bucket_start)], self.smoothed_var_last_epoch[int(label - self.bucket_start)]))