christophstoeckl / FS-neurons

47 stars 16 forks source link

The parameter training effect is not good #23

Closed lxhwyj closed 7 months ago

lxhwyj commented 11 months ago

Hello, I trained sigmoid() in find_coeffs.py and the loss dropped to 0.0002392404. However, when I tested it in fs_codes.py with trained parameters, the results were quite different. @christophstoeckl

christophstoeckl commented 11 months ago

If I understand you correctly you are saying that you optimized an FS-neuron on the sigmoid function and got a very low loss (suggesting that the FS-neuron does approximate the sigmoid function very well).

But then you use the FS-neuron in a network and there is a drop in performance?

The first thing you could look into would be the ranges of inputs that the sigmoid function receives. It is important that when you optimize the FS-neuron you optimize it on a range of inputs that is sufficiently large, otherwise when put into the network you might run into an out of distribution problem, lowering the performance.

lxhwyj commented 11 months ago

Hello,

When I optimized fs neurons, the input range was set to [-5, 5] and the batch_size was 100000.

When I test, I directly call the fs_sigmoid() method in fs_coding.py and compare it with tensorflow's sigmoid(). The range of test data is set to [-1, 1] and the batch_size is 1000. When I used fs neurons in the network, there was still a big gap between the performance and sigmoid().

At 2023-11-26 01:39:13, "Christoph Stöckl" @.***> wrote:

If I understand you correctly you are saying that you optimized an FS-neuron on the sigmoid function and got a very low loss (suggesting that the FS-neuron does approximate the sigmoid function very well).

But then you use the FS-neuron in a network and there is a drop in performance?

The first thing you could look into would be the ranges of inputs that the sigmoid function receives. It is important that when you optimize the FS-neuron you optimize it on a range of inputs that is sufficiently large, otherwise when put into the network you might run into an out of distribution problem, lowering the performance.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.Message ID: @.***>

christophstoeckl commented 11 months ago

So what is the loss when you compare the fs_sigmoid to the tf sigmoid? And how much performance do you lose when replacing the tf sigmoid with the fs_sigmoid in the network?

lxhwyj commented 11 months ago

When I trained h, d and T parameters, the final loss was 0.00023924. However, when I compare fs_sigmoid to tf sigmoid, the loss is 0.20369905.

When I did the test, the input was [-1,1].

tf sigmoid: [0.26894143 0.26933524 0.2705188... 0.73027056 0.7306648 0.73105854].

fs_sigmoid: [0.16192205 0.16192205 0.16192205... 1.4400398 1.4400398 1.4400398].

At 2023-11-26 17:34:52, "Christoph Stöckl" @.***> wrote:

So what is the loss when you compare the fs_sigmoid to the tf sigmoid? And how much performance do you lose when replacing the tf sigmoid with the fs_sigmoid in the network?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.Message ID: @.***>

christophstoeckl commented 11 months ago

This seems a bit strange to me. When you optimize the parameters of the FS-neuron you get a loss of 0.00023924, which should represent the error the FS-neuron makes when approximating a sigmoid function. When you later on compare it this shouldn't fundamentally change anything in the way you compute the loss, so I don't really understand how a difference of three orders of magnitude can arise.

Perhaps it would be a good idea to investigate the similarities and differences between how you compute the loss during training and how you do it in the comparison later on. Then, I would try to make both setups as similar as possible to find out what causes this difference. One starting point could be to use the same range of input values.

lxhwyj commented 10 months ago

Thanks. You're right. There was a slight difference between the Settings I used for training and testing.