Closed youralien closed 9 years ago
Yes. Since all the denoising cost multipliers are zero, there won't be a gradient from the decoder. Thus, this will train just the encoder as normal feedforward network using the labeled samples. The still builds the graph in Theano for the decoder, but Theano should optimize it away (if I recall correctly).
I hope this answered the question!
Do you have intuition on what this means? When training, denoising costs are printed and computed -- but I infer not added to the actual cost that is used in the backprop step.