Possible use of padding instead of max tokens to avoid error when calculating loss.

ManifoldRG / NEKO

In Progress Implementation of GATO style Generalist Multimodal model capable of image, text, RL and Robotics tasks

https://discord.gg/brsPnzNd8h

GNU General Public License v3.0

43 stars 10 forks source link

Possible use of padding instead of max tokens to avoid error when calculating loss. #83

Open eihli opened 6 months ago

eihli commented 6 months ago

TODO: Add details. This is just a rough draft of something Henry and I were talking about over a screen share.

def pad(predicted, target):
    torch.tensor()
    if len(target) > len(predicted):
        return target, F.pad(predicted, (0, len(target) - len(predicted)), 'constant', 0)
    else:
        return F.pad(target, (0, len(predicted) - len(target)), 'constant', 0), predicted