dingo-actual / om

An LLM architecture utilizing a recurrent structure and multi-layer memory
https://github.com/dingo-actual/om
Apache License 2.0
11 stars 0 forks source link

Revert to traditional residual structure in `ARCformer` #10

Closed rtaylor-rx-m closed 1 week ago

rtaylor-rx-m commented 1 week ago

In the case where the MLP uses the 1-2-2-1 structure, the additional depth would cause the double-length residual to potentially lead to unstable gradients. Best just to use the traditional residual structure.

dingo-actual commented 1 week ago

Fixed.