first of all thanks for this very interesting analysis you have done! I was wondering whether you could share any observations regarding training time. Does the number of epochs the model has been trained influence the number of locally active or single attribute channels?
Hi,
first of all thanks for this very interesting analysis you have done! I was wondering whether you could share any observations regarding training time. Does the number of epochs the model has been trained influence the number of locally active or single attribute channels?