ManifoldRG / NEKO

In Progress Implementation of GATO style Generalist Multimodal model capable of image, text, RL and Robotics tasks
https://discord.gg/brsPnzNd8h
GNU General Public License v3.0
43 stars 10 forks source link

Issues found in code review - series #1 #17

Open henryj18 opened 1 year ago

henryj18 commented 1 year ago

This is one of the series of issues found during the code review under the branch "add_text_modality" (some issues are inherited from the master branch), we will keep track of multiple issues found in code review and fix them in one batch (for efficiency).

Issue 1: between https://github.com/ManifoldRG/gato-control/blob/master/gato/policy/gato_policy.py#L81 and https://github.com/ManifoldRG/gato-control/blob/master/gato/policy/gato_policy.py#L113 (Inside the definition of class GatoPolicy(), we put this note here since we are not sure whether the line numbers will shift as the development goes on and more code is added/revised):

“config” is instantiated as an instance of PretrainedConfig or its subclass GPT2Config which do not have the attributes flash and gate. Need to remove the two attributes "flash" and "gate", in particular , need to remove the assignment statements with "config.flash" and "config.gate" on the left side of the assignments, such as "config.flash = flash" and "config.gate = False", etc. These two attributes and their assignments will not take effect, and will be ignored when passed into GPT2Model as attributes of the "config". The existence of these lines of code is misleading since we may think flash attention and gating layers are used when they are actually not

Issue 2: We should add an option to allow text tokens to be initialized with GPT2 pretrained embedding. Need to make decision whether this option is turned automatically if the pretrained_lm option is turned on, or a separate option should be added. Seems that we should prefer a separate option. And if this option is turned on, we should add some code after: self.embed_token = nn.Embedding(self.vocab_size, embed_dim) inside "init" of GatoPolicy() to access the portion of embedding data that are associated with text tokens (the first 50257 tokens out of the complete vocab, i.e. the size of GPT2 text tokens) and initialize that with the pretrained embedding

henryj18 commented 1 year ago

Issue 3: https://github.com/ManifoldRG/gato-control/blob/add_text_modality/gato/policy/gato_policy.py#L374 This line of conde: tokens_per_timestep = batch_embeddings.shape[1] # number of tokens per timestep will return a value of embed_dim for the "text" modality which is incorrect outcome. The cause is that the logic of this line is from the other modalities such as "continuous control" or "discrete control" where there is a n_timesteps, tokens_per_timestep, and therefore the embedding's shape is (n_timesteps, tokens_per_timestep, embed_dim), whereas the embedding's shape for "text" modality is (seq_len, embed_dim), we can treat seq_len as n_timesteps for the sake of code consistency, but thre is no tokens_per_timestep here, so we can do one of the following:

  1. use unsqueeze or something similar to add a dummy dimension to the embedding of text modality so that its shape will become (seq_len, 1, embed_dim), then you can take tokens_per_timestep = seq_len, and tokens_per_timestep = 1
  2. add some logic at the above-mentioned line of code in question so that we will assign tokens_per_timestep = 1 if it is "text" modality instead of performing: tokens_per_timestep = batch_embeddings.shape[1]