Closed razvanc92 closed 3 years ago
could you share a screenshot?
@yuqirose Sure. If we look at the following image on line 96 the value is sigmoid(fn(...)).
Now if we look at the next picture (line 121) we are applying again sigmoid.
Am I missing something?
@razvanc92 line 121 is applied to Wx, line 96 is applied to f(Wx) + b
@yuqirose Thank you for your replay. Following the code we are doing sigmoid(sigmoid(Wx) + b) which is different from the original implementation of GRU which uses sigmoid(Wx +b).
I was looking at the implementation of the DCGRUCell, and I've spotted something out of order. If we are using 'fc' for U and R gates we are going to apply the sigmoid twice (line 121 and 96). Is this how it's intended to work, or is there a bug?