Discrepancy in nonlinearities between code and paper

Tom-Lotze / FACT

Repo for project for Fairness, Accountability, Confidentialy and Transparency for master AI (Jan 2020)

0 stars 0 forks source link

Discrepancy in nonlinearities between code and paper #3

Closed Tom-Lotze closed 4 years ago

Tom-Lotze commented 4 years ago

Hi,

We noticed that the use of non-linearities is not consistent between the code and the paper. In the paper it is stated that for every layer in encoder/decoder a sigmoid is used. However, in the code a ReLU is used everywhere (except for the last layer of the decoder). Should we follow the paper / code or see for ourselves what works best?

Thanks!

phlippe commented 4 years ago

Hi, thanks for asking. I was also a bit confused by the paper when initially reading they are using sigmoid everywhere. ReLU is a natural choice as an activation function within the network and is expected to perform much better, while sigmoid introduces a bias and worse gradient flow. So I would go with ReLU, but if you want, you are also allowed to test out both and pick whichever works best. Also, put a comment in the code and eventually in the report which activation you use.