Closed quqixun closed 3 years ago
Questions:
- The code did not scale features after dropping nodes randomly while training. It's not consist with paper.
- Should it be original features as DropNode output during inference?
- Should (1 - droprate) or 1 / (1 - droprate) be scale factor?
Description of DropNode in paper:
Code of DropNode:
if training: masks = torch.bernoulli(1. - drop_rates).unsqueeze(1) features = masks.cuda() * features # did not scale features after dropping nodes randomly else: features = features * (1. - drop_rate) # scaled features during inference
Hi, thanks for your interests! DropNode/dropout can be implemented with two ways: 1) Scaling features with 1 / (1 - droprate) during training. 2) Scaling features with (1 - droprate) during inference. Our paper only describes the first method, but we implement it with the second. Both methods are correct.
Questions:
Description of DropNode in paper:
Code of DropNode: