Why average feature update process apply same weight to the old feature and new feature

blackfeather-wang / ISDA-for-Deep-Networks

An efficient implicit semantic augmentation method, complementary to existing non-semantic techniques.

582 stars 93 forks source link

Why average feature update process apply same weight to the old feature and new feature #8

Closed SupetZYK closed 3 years ago

SupetZYK commented 3 years ago

Hi, I read the EstimatorCV and found that the average feature are calculated by averaging all the running features, including the old features and new ones. Should it be more reasonable to apply bigger weight to new features?

blackfeather-wang commented 3 years ago

Thank you for your attention.

In fact, these two approaches lead to the same estimates of covariance since the networks become stable gradually during training. We find that the current implementation has already worked well, and simply do not try applying bigger weight to new features. In addition, such a mechanism may involve additional hyper-parameters, complicating the algorithm. I'm not sure if the performance gains worth it.