QueuQ / CGLB

Other
45 stars 13 forks source link

add more units to the output layer in LWF #8

Closed Tianqi-py closed 1 year ago

Tianqi-py commented 1 year ago

Hi,

Thanks for open source the code for CGLB. I was implementing LWF in the task-incremental setting and ran into a problem when I tried to add more units to the output when the new task came. For example, I have four classes in the first task and five classes in the second task. So in the first time step, the output unit is four, at the second time step, the output units of the model should be 9. I tried:

class GCN(torch.nn.Module): def init(self, in_channels, hidden_channels, out_channels): super(GCN, self).init() self.conv1 = GCNConv(in_channels, hidden_channels) self.conv2 = GCNConv(hidden_channels, out_channels) self.apply(kaiming_normal_init)

def forward(self, x, edge_index):
    x = F.relu(self.conv1(x, edge_index))
    x = self.conv2(x, edge_index)
    m = torch.nn.Sigmoid()
    return m(x)

def add_new_outputs(self, num_new_classes):
    # Add new output units for each new class
    in_channels = self.conv2.out_channels
    out_channels = self.conv2.out_channels + num_new_classes
    new_conv2 = GCNConv(in_channels, out_channels)

    # the parameters trained for the old tasks copied to the new defined layer
    print(self.conv2.weight)
    print(new_conv2.weight)
    new_conv2.weight[:in_channels] = self.conv2.weight
    new_conv2.bias[:in_channels] = self.conv2.bias

    self.conv2 = new_conv2

and got the error message "Object GCNConv has no attribute weight". May i ask how did you copy the weights from the previous model to the new model in LWF? or you assume the model knows how many classes in total the datasets have

Thanks in advance!

QueuQ commented 1 year ago

Hi,

Thanks for your interest!

In our implementation, we do adopt a different strategy to increase the output units. Instead of modifying the model each time when new classes come in, which is inconvenient, we choose to first allocate enough number of output units in advance. During learning, if the first task contains 4 classes, we will only activate the first 4 output units, then the second task contains 5 classes, we will activate the next 5 units. In our experiments, we do initialize the number of output units to be equal to the number of all classes in the dataset, but that is not necessary. In practice, if we do not know the exact number of classes we are going to encounter, we may still have an expectation. If we expect the number of classes to be around 40, we may pre-allocate 50 or 100 to ensure that the model has enough capacity to expand.

The approach provided in your example is also a possible way to expand the network, but that requires modifying the model structure (e.g. expansion and weight copying in your example) and may not be very efficient. While its advantage is that it does not need to consume the space (for output units) before it starts to use it.

In other words, this is a trade-off between time and space. Allocating new units when the model really needs it saves space but cost more time. Pre-allocating output units consumes more space in advance but saves time.

Hi,

Thanks for open source the code for CGLB. I was implementing LWF in the task-incremental setting and ran into a problem when I tried to add more units to the output when the new task came. For example, I have four classes in the first task and five classes in the second task. So in the first time step, the output unit is four, at the second time step, the output units of the model should be 9. I tried:

class GCN(torch.nn.Module): def init(self, in_channels, hidden_channels, out_channels): super(GCN, self).init() self.conv1 = GCNConv(in_channels, hidden_channels) self.conv2 = GCNConv(hidden_channels, out_channels) self.apply(kaiming_normal_init)

def forward(self, x, edge_index):
    x = F.relu(self.conv1(x, edge_index))
    x = self.conv2(x, edge_index)
    m = torch.nn.Sigmoid()
    return m(x)

def add_new_outputs(self, num_new_classes):
    # Add new output units for each new class
    in_channels = self.conv2.out_channels
    out_channels = self.conv2.out_channels + num_new_classes
    new_conv2 = GCNConv(in_channels, out_channels)

    # the parameters trained for the old tasks copied to the new defined layer
    print(self.conv2.weight)
    print(new_conv2.weight)
    new_conv2.weight[:in_channels] = self.conv2.weight
    new_conv2.bias[:in_channels] = self.conv2.bias

    self.conv2 = new_conv2

and got the error message "Object GCNConv has no attribute weight". May i ask how did you copy the weights from the previous model to the new model in LWF? or you assume the model knows how many classes in total the datasets have

Thanks in advance!

Tianqi-py commented 1 year ago

Hi,

Thanks for your quick and detailed explanations. They are very helpful :)