FMInference / DejaVu

263 stars 32 forks source link

Some questions about implementation details #13

Open SUDA-HLT-ywfang opened 10 months ago

SUDA-HLT-ywfang commented 10 months ago

Hi, dejavu is really fascinating! Thanks a lot for releasing the corresponding code.

I have some questions about implementation details.

  1. In section 3.1, how do you get the sparsity in every layer? If there is a threshold, then what is the threshold set to?
  2. When you train sparse predictors, it seems like you only care about the recall of classifiers, instead of precision or f1 value. Why is that?

Thank you! Hope to hear from you!

AmazeQiu commented 7 months ago

I have the same question.

XieWeikai commented 7 months ago

I also wonder how 1 is done.

I think it is reasonable that the author only care about recall. The activated neurons contribute most of the activation, while non-activated neurons are less important. So we want to find all activated neurons to ensure the model accuracy. The neurons that are not activated but predicted as activated do not have a negative impact on the results, but the activated neurons predicted as not activated have a significant impact on the results. That's why recall is used.

MaTwickenham commented 3 months ago

Hi guys, I would like to ask if the term 'activating neurons' in the FFN in the paper refers to a row or column of parameters in a linear layer? For example, if a neural network only has one linear layer (256, 512), with input x as (1, 256), and the output output is (1, 512). Then, for predicting neuron activation, does the MLP predictor need to take x as input and output a (1, 256) or (1, 512) tensor as the activation_mask indicating which row/column of weights are activated? I don't know if I understand it correctly.