Closed aptlin closed 4 years ago
@myleott @edunov
The v2 of the paper is out. The base transformer performs better than expected, but Levenshtein still beats the base in speed, and provides comparable results for summarization:
@kahne could you take a look at this?
@sdll Were you still planning to publish these to the repo?
The Levenshtein transformer paper reports 0.75+ improvements in ROUGE-L in the abstractive text summarization task on Gigaword over the baseline transformer.
The team of @fvadzim, @whiteRa2bit, @NickShatalov and I would love to reproduce the result as part of the intensive practicum organized by Yandex (here is the description in Russian) and continue working on the PR after the event ends on November 16, trying the model out on the Russian news dataset and contributing the docs that explain the training procedure to FAIRSEQ.
Proposal
Here is the plan of what we would love to contribute:
Creating a new page on text summarization in examples
The first sentence in README mentions summarization amongst others, but there is no complete description of how to achieve this, despite the fact that both the Levenshtein transformer implementation and
pay_less_attention_paper
seem to have almost all of the necessary code to make it work.Making a new task for training the Levenshtein transformer for abstractive text summarization
The end goal would be to train the model on both English and Russian datasets.
Questions
Seems that the current implementation is under active development at the moment, given a number of issues on SIGSEV in the multi-GPU environment:
Are there any precautions on which commit of the repo to use in order to avoid these issues? Is the fix/major update coming soon?