fkodom / yet-another-retnet

A simple but robust PyTorch implementation of RetNet from "Retentive Network: A Successor to Transformer for Large Language Models" (https://arxiv.org/pdf/2307.08621.pdf)
MIT License
101 stars 15 forks source link

An initiallization issue #27

Open leor-c opened 5 months ago

leor-c commented 5 months ago

Hi again (: I've found a small problem in the current implementation of the initiallization of the RetNetDecoder class. Specifically, to build a multi-layered model, this class uses deepcopy to copy the single RetNetDecoderLayer object it recieves as input. This copy leads to the following problems:

  1. The parameters of the layers are not I.I.D.
  2. Consequently, the "lottery ticket hypothesis" does not apply (at least there is no established evidence for this phenomena in the non I.I.D. case).

It's not a very serious issue, but I think it's worth fixing. I would be happy to implement a solution. I wanted to discuss which design would be preferred here: One possible solution could be to change RetNetDecoder.__init__ to get a list of layer objects (initiallized externally). Alternatively, it is also possible to store the arguments of the layer as properties and initiallize the new layers based on the properties of the given layer. Another possible solution could be to define a configuration object with which a RetNetDecoderLayer object is initiallized, and pass an instance of it to RetNetDecoder.__init__ instead of an actual layer object.

There may be other solutions as well. Which one do you think would be ideal here? Do you have other solution ideas? Thanks!

fkodom commented 5 months ago

Hey @leor-c! Wanted to let you know I see this, and will try to look at it soon. Sorry, I'm a bit swamped with other things this week. 😅