facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.3k stars 6.39k forks source link

A FAIRSEQ text summarization example: the abstractive approach with a Levenshtein transformer #1347

Closed aptlin closed 4 years ago

aptlin commented 4 years ago

The Levenshtein transformer paper reports 0.75+ improvements in ROUGE-L in the abstractive text summarization task on Gigaword over the baseline transformer.

benchmarks

The team of @fvadzim, @whiteRa2bit, @NickShatalov and I would love to reproduce the result as part of the intensive practicum organized by Yandex (here is the description in Russian) and continue working on the PR after the event ends on November 16, trying the model out on the Russian news dataset and contributing the docs that explain the training procedure to FAIRSEQ.

Proposal

Here is the plan of what we would love to contribute:

  1. Creating a new page on text summarization in examples

    The first sentence in README mentions summarization amongst others, but there is no complete description of how to achieve this, despite the fact that both the Levenshtein transformer implementation and pay_less_attention_paper seem to have almost all of the necessary code to make it work.

  2. Making a new task for training the Levenshtein transformer for abstractive text summarization

    The end goal would be to train the model on both English and Russian datasets.

Questions

  1. Could you please tell me whether there are any apparent roadblocks in the code itself you can see already that can prevent this plan from succeeding?
  2. The paper uses a Transformer base as a teacher to obtain ROUGE-L of 33.81. The current implementation of NAT NMT also takes the teacher instead of oracle approach as well, so this should help us in setting the training up. Another training scheme that @justheuristic has mentioned in private communication is the one similar to the NMT refinement method introduced by @lena-voita, @rsennrich and @anvdev in this paper: the idea is to produce an extractive summary first, and then refine it with Leveshtein. Have you tested this idea? Sounds nice to include this variation in the comparison as well.
  3. Seems that the current implementation is under active development at the moment, given a number of issues on SIGSEV in the multi-GPU environment:

    Are there any precautions on which commit of the repo to use in order to avoid these issues? Is the fix/major update coming soon?

kalyangvs commented 4 years ago

@myleott @edunov

aptlin commented 4 years ago

The v2 of the paper is out. The base transformer performs better than expected, but Levenshtein still beats the base in speed, and provides comparable results for summarization:

IMAGE 2019-11-11 11:23:37

huihuifan commented 4 years ago

@kahne could you take a look at this?

Cloudmersive commented 4 years ago

@sdll Were you still planning to publish these to the repo?

aptlin commented 4 years ago

No, sorry, I do not have the bandwidth now to brush up our results, but you can take a look here for the training scripts and here for fairseq with the comet.ml support.