akshayatam / machine-translation-with-retnet

A machine translation model using Retentive Network
0 stars 0 forks source link
machine-translation microsoft natural-language-processing torchscale

Unlocking the Translator's Code - Machine Translation using RetNet

This project was part of my finals project in CS 583 - Deep Learning at Stevens Institute of Technology. It uses a novel decoder model called RetNet from the torchscale library from Microsoft. It uses multi-scale retention as opposed to multi-head attention commonly used in transformer models. By using the retentive network, it allows for transformer like performance with better language modelling, lower memory consumption, higher throughput, and lower latency.

The model was trained on IWSLT 2017 dataset of English to French sentences (available on huggingface). The dataset description is as follows:

Number of examples
Train 232,825
Validation 890
Test 8,597

Separate encoder and decoder models were created with the following configuration:

ENCODER

RETNET DECODER

Hyperparameters

The model was evaluated using BLEU score and the model achieved a BLEU score of 36.4 on whole dataset.

I have also published an article on Medium.

If you find my work interesting or have any suggestions, let me know. Do cite my work if you find it valuable!