An initiallization issue

Hi again (: I've found a small problem in the current implementation of the initiallization of the RetNetDecoder class. Specifically, to build a multi-layered model, this class uses deepcopy to copy the single RetNetDecoderLayer object it recieves as input. This copy leads to the following problems:

The parameters of the layers are not I.I.D.
Consequently, the "lottery ticket hypothesis" does not apply (at least there is no established evidence for this phenomena in the non I.I.D. case).

It's not a very serious issue, but I think it's worth fixing. I would be happy to implement a solution. I wanted to discuss which design would be preferred here: One possible solution could be to change RetNetDecoder.__init__ to get a list of layer objects (initiallized externally). Alternatively, it is also possible to store the arguments of the layer as properties and initiallize the new layers based on the properties of the given layer. Another possible solution could be to define a configuration object with which a RetNetDecoderLayer object is initiallized, and pass an instance of it to RetNetDecoder.__init__ instead of an actual layer object.

There may be other solutions as well. Which one do you think would be ideal here? Do you have other solution ideas? Thanks!

fkodom / yet-another-retnet

An initiallization issue #27