understanding the feature loss network

Hi, I was trying to understand the feature loss function using the code and the paper. As I understand it, we have the output of the denoising network (call it g(x_noisy)) and we have a clean audio x_clean. We pass these through the feature loss network and then aggregate the loss of these two components across multiple layers (of the feature loss network). In your paper, weights lambda_m correspond to the weighting done for each of the feature layers of the loss network (which should be 14 according to the code (senet_train.py)). But when I looked through the dimension of loss_w (senet_train.py line 104) it shows the dimension as SE_LOSS_LAYERS (which is initialized as 6 in the code). Does this mean that you are taking a loss across 6 of the feature loss layers or is something wrong with my understanding? If yes, then any reason that you take only the first 6 layers' losses and not beyond that?

Thanks!

francoisgermain / SpeechDenoisingWithDeepFeatureLosses

understanding the feature loss network #6