[Simultaneous Machine Translation-MMA]:Building 'alignment_train_cuda_binding' extension

facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

MIT License

30.16k stars 6.37k forks source link

[Simultaneous Machine Translation-MMA]:Building 'alignment_train_cuda_binding' extension #4085

Open YangXusheng-yxs opened 2 years ago

YangXusheng-yxs commented 2 years ago

Bug

Building 'alignment_train_cuda_binding' extension, CUB building problem -- Simultaneous Machine Translation Example

To Reproduce & Code sample

There is example how to use MMA model(Simultaneous Machine Translation) on page: https://github.com/pytorch/fairseq/blob/main/examples/simultaneous_translation/docs/ende-mma.md

First, I follow the main page to install fairseq.
```
git clone https://github.com/pytorch/fairseq
cd fairseq
pip install --editable ./
```
I successfully installed fairseq.

Then, I tried to run example code:

fairseq-train \
data-bin/wmt17_en_de \
--simul-type waitk \
--waitk-lagging 3 \
--mass-preservation \
--criterion label_smoothed_cross_entropy \
--max-update 50000 \
--arch transformer_monotonic_iwslt_de_en \
--save-dir checkpoints/monotonic_wmt_en_de \
--optimizer adam \
--adam-betas '(0.9, 0.98)' \
--lr-scheduler 'inverse_sqrt' \
--warmup-init-lr 1e-7  \
--warmup-updates 4000 \
--lr 5e-4 \
--stop-min-lr 1e-9 \
--clip-norm 0.0 \
--weight-decay 0.0001\
--dropout 0.3 \
--label-smoothing 0.1\
--max-tokens 3584 \

But it doesn't work after commit with error, see error

ModuleNotFoundError: NO module named 'alignment_train_cuda_binding'

(I noticed that this is a new module updated before 30 days)

To build relative extension package and pass the code 'from alignment_train_cuda_binding import alignment_train_cuda'
I set the CUDA_HOME path in ~/.bashrc and implemented the code in terminal
```
 python setup.py build_ext --inplace 
```
but when building 'alignment_train_cuda_binding' extension, see error
```
 fatal error: cub/cub.cuh:no such file or directory
 #include <cub/cub.cuh>
 compilation terminated.
```

And I google this issue, someone say it should lack CUB package.

So I git clone latest version CUB and put it in path: '/usr/local/cuda-10.2/targets/x86_64-linux/include/cub'
```
git clone https://github.com/NVIDIA/cub
/usr/local/cuda-10.2/targets/x86_64-linux/include/cub
```

Again, I implemented the code in terminal

 python setup.py build_ext --inplace

But a new problem happened, see error


building 'alignment_train_cuda_binding' extension
/usr/local/cuda-10.2/bin/nvcc -I xxx
.............
/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/cub/block/../iterator/cache_modified_input_iterator.cuh(116):error: a class or namespace qualified name is required

/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/cub/block/../iterator/cache_modified_input_iterator.cuh(116):error: qualified name is not allowed

/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/cub/block/../iterator/cache_modified_input_iterator.cuh(116):error: expected a ";"

/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/cub/agent/agent_merge_sort.cuh(80):error: a class or namespace qualified name is required

/usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/cub/agent/agent_merge_sort.cuh(80):error: qualified name is not allowed

..............

error: command '/usr/local/cuda-10.2/bin/nvcc' failed with exit status 1



### Expected behavior

<!-- A clear and concise description of what you expected to happen. -->
**Please help me to solve this issue.Can you tell me how to solve the problem? Thanks a lot!**

I guess:
1. whether the cuda10.2 don't support this module ? 
2. And should I try to download a old version CUB library, and which version?
3. or other methods? maybe I can install a old version fairseq(0.10.0) which don't need module named 'alignment_train_cuda_binding'.

### Environment

 - fairseq Version : main brach ;1.0.0a0+2380a6e (confused number)
 - PyTorch Version : 1.10+cu
 - OS : Ubuntu 18.04
 - How you installed fairseq : pip install --editable ./ 
 - Python version : 3.6.8 virtualenv
 - CUDA/cuDNN version : cuda 10.2 / cuDNN temporary empty
 - GPU models and configuration : Quadro RTX 5000
 - Any other relevant information :

### Additional context

<Sorry, because of privacy, I cannot upload code and picture of my error>

MenggeLiu commented 2 years ago

Encountered the same problem when trying to train monotonic models for simultaneous_translation examples. Using cuda-10.2 and gcc-7.5.0, git clone cub to $CUDA_HOME/targets/x86_64-linux/include/cub, get a lot of cub errors.

/home/liumengge/cuda-10.2/bin/../targets/x86_64-linux/include/cub/iterator/cache_modified_output_iterator.cuh(141): warning: parsing restarts here after previous syntax error /home/liumengge/cuda-10.2/bin/../targets/x86_64-linux/include/cub/iterator/discard_output_iterator.cuh(74): error: a class or namespace qualified name is required /home/liumengge/cuda-10.2/bin/../targets/x86_64-linux/include/cub/iterator/discard_output_iterator.cuh(74): error: qualified name is not allowed /home/liumengge/cuda-10.2/bin/../targets/x86_64-linux/include/cub/iterator/discard_output_iterator.cuh(74): error: expected a ";" /home/liumengge/cuda-10.2/bin/../targets/x86_64-linux/include/cub/iterator/discard_output_iterator.cuh(79): warning: parsing restarts here after previous syntax error /home/liumengge/cuda-10.2/bin/../targets/x86_64-linux/include/cub/iterator/tex_obj_input_iterator.cuh(120): error: a class or namespace qualified name is required /home/liumengge/cuda-10.2/bin/../targets/x86_64-linux/include/cub/iterator/tex_obj_input_iterator.cuh(120): error: qualified name is not allowed /home/liumengge/cuda-10.2/bin/../targets/x86_64-linux/include/cub/iterator/tex_obj_input_iterator.cuh(120): error: expected a ";" /home/liumengge/cuda-10.2/bin/../targets/x86_64-linux/include/cub/iterator/tex_obj_input_iterator.cuh(125): warning: parsing restarts here after previous syntax error /home/liumengge/cuda-10.2/bin/../targets/x86_64-linux/include/cub/iterator/tex_ref_input_iterator.cuh(261): error: a class or namespace qualified name is required /home/liumengge/cuda-10.2/bin/../targets/x86_64-linux/include/cub/iterator/tex_ref_input_iterator.cuh(261): error: qualified name is not allowed /home/liumengge/cuda-10.2/bin/../targets/x86_64-linux/include/cub/iterator/tex_ref_input_iterator.cuh(261): error: expected a ";" /home/liumengge/cuda-10.2/bin/../targets/x86_64-linux/include/cub/iterator/tex_ref_input_iterator.cuh(266): warning: parsing restarts here after previous syntax error /home/liumengge/cuda-10.2/bin/../targets/x86_64-linux/include/cub/iterator/transform_input_iterator.cuh(126): error: a class or namespace qualified name is required /home/liumengge/cuda-10.2/bin/../targets/x86_64-linux/include/cub/iterator/transform_input_iterator.cuh(126): error: qualified name is not allowed /home/liumengge/cuda-10.2/bin/../targets/x86_64-linux/include/cub/iterator/transform_input_iterator.cuh(126): error: expected a ";" /home/liumengge/cuda-10.2/bin/../targets/x86_64-linux/include/cub/iterator/transform_input_iterator.cuh(131): warning: parsing restarts here after previous syntax error 46 errors detected in the compilation of "/tmp/tmpxft_0002a1b3_00000000-6_alignment_train_kernel.cpp1.ii".

MenggeLiu commented 2 years ago

Encountered the same problem when trying to train monotonic models for simultaneous_translation examples. Using cuda-10.2 and gcc-7.5.0, git clone cub to $CUDA_HOME/targets/x86_64-linux/include/cub, get a lot of cub errors.

/home/liumengge/cuda-10.2/bin/../targets/x86_64-linux/include/cub/iterator/cache_modified_output_iterator.cuh(141): warning: parsing restarts here after previous syntax error /home/liumengge/cuda-10.2/bin/../targets/x86_64-linux/include/cub/iterator/discard_output_iterator.cuh(74): error: a class or namespace qualified name is required /home/liumengge/cuda-10.2/bin/../targets/x86_64-linux/include/cub/iterator/discard_output_iterator.cuh(74): error: qualified name is not allowed /home/liumengge/cuda-10.2/bin/../targets/x86_64-linux/include/cub/iterator/discard_output_iterator.cuh(74): error: expected a ";" /home/liumengge/cuda-10.2/bin/../targets/x86_64-linux/include/cub/iterator/discard_output_iterator.cuh(79): warning: parsing restarts here after previous syntax error /home/liumengge/cuda-10.2/bin/../targets/x86_64-linux/include/cub/iterator/tex_obj_input_iterator.cuh(120): error: a class or namespace qualified name is required /home/liumengge/cuda-10.2/bin/../targets/x86_64-linux/include/cub/iterator/tex_obj_input_iterator.cuh(120): error: qualified name is not allowed /home/liumengge/cuda-10.2/bin/../targets/x86_64-linux/include/cub/iterator/tex_obj_input_iterator.cuh(120): error: expected a ";" /home/liumengge/cuda-10.2/bin/../targets/x86_64-linux/include/cub/iterator/tex_obj_input_iterator.cuh(125): warning: parsing restarts here after previous syntax error /home/liumengge/cuda-10.2/bin/../targets/x86_64-linux/include/cub/iterator/tex_ref_input_iterator.cuh(261): error: a class or namespace qualified name is required /home/liumengge/cuda-10.2/bin/../targets/x86_64-linux/include/cub/iterator/tex_ref_input_iterator.cuh(261): error: qualified name is not allowed /home/liumengge/cuda-10.2/bin/../targets/x86_64-linux/include/cub/iterator/tex_ref_input_iterator.cuh(261): error: expected a ";" /home/liumengge/cuda-10.2/bin/../targets/x86_64-linux/include/cub/iterator/tex_ref_input_iterator.cuh(266): warning: parsing restarts here after previous syntax error /home/liumengge/cuda-10.2/bin/../targets/x86_64-linux/include/cub/iterator/transform_input_iterator.cuh(126): error: a class or namespace qualified name is required /home/liumengge/cuda-10.2/bin/../targets/x86_64-linux/include/cub/iterator/transform_input_iterator.cuh(126): error: qualified name is not allowed /home/liumengge/cuda-10.2/bin/../targets/x86_64-linux/include/cub/iterator/transform_input_iterator.cuh(126): error: expected a ";" /home/liumengge/cuda-10.2/bin/../targets/x86_64-linux/include/cub/iterator/transform_input_iterator.cuh(131): warning: parsing restarts here after previous syntax error 46 errors detected in the compilation of "/tmp/tmpxft_0002a1b3_00000000-6_alignment_train_kernel.cpp1.ii".

Using cuda 11.3 will solve cub not found error, the cub module will be installed with cuda.

EricLina commented 2 years ago

I met the same question when I ran the example in https://github.com/pytorch/fairseq/blob/main/examples/simultaneous_translation/docs/ende-mma.md and my cuda version is 11.2

EricLina commented 2 years ago

I found that git reset --hard dd3bd3c0497ae9a7ae7364404a6b0a4c501780b3 could solve this problem

me301 commented 2 years ago

I found that git reset --hard dd3bd3c0497ae9a7ae7364404a6b0a4c501780b3 could solve this problem

Instead of git reset, git checkout is better.

git checkout dd3bd3c0497ae9a7ae7364404a6b0a4c501780b3

and git checkout main to go back to the main branch.

YangXusheng-yxs commented 2 years ago

some useful tips I tried :

Try to install a old version fairseq(0.10.0) with cuda10.2 which don't need module named 'alignment_train_cuda_binding'.
Install cuda 11.1 or 11.3 including cub module.
But I didn't try to change the branch.

George0828Zhang commented 2 years ago

I thought I'd share my workaround for this issue:

Step 1: In utils/monotonic_attention.py, add the following lines:

import torch.utils.cpp_extension
from pathlib import Path

module_path = Path(__file__).parent.parent.parent / "operators"
build_path = module_path / "build"
build_path.mkdir(exist_ok=True)

alignment_train_cpu_binding = torch.utils.cpp_extension.load(
    "alignment_train_cpu_binding",
    sources=[
        module_path / "alignment_train_cpu.cpp",
    ],
    build_directory=build_path.as_posix()
)

alignment_train_cuda_binding = torch.utils.cpp_extension.load(
    "alignment_train_cuda_binding",
    sources=[
        module_path / "alignment_train_cuda.cpp",
        module_path / "alignment_train_kernel.cu"
    ],
    build_directory=build_path.as_posix()
)

Step 2: In the same file, replace line 45 to 52 from here with:

  alpha = p_choose.new_zeros([bsz, tgt_len, src_len])
  if p_choose.is_cuda:
      p_choose = p_choose.contiguous()
      alignment_train_cuda_binding.alignment_train_cuda(p_choose, alpha, eps)
      # from alignment_train_cuda_binding import alignment_train_cuda as alignment_train
  else:
      alignment_train_cpu_binding.alignment_train_cpu(p_choose, alpha, eps)
      # from alignment_train_cpu_binding import alignment_train_cpu as alignment_train
  # alpha = p_choose.new_zeros([bsz, tgt_len, src_len])
  # alignment_train(p_choose, alpha, eps)

What it does is using ninja to compile the extension on the first import.

Hope this helps.

EricLina commented 2 years ago

Are you using fairseq to reproduce MMA now ? I met some problem and could you please give me some help? Or ... Discuss.

SimulEval is the package for evaluating the model and it gives out the AP,AL,BLEU. Following this guide ende-mma , I trained a model of MMA , but I don’t know how to evaluate it. In another guide enja-waitk , I don’t know how to prepare the dataset .

I guess you are following the same guide, too . So do you know how to evaluate it ?

George0828Zhang commented 2 years ago

Hi, Yes I'm trying to reproduce MMA on SimulST, and so far waitk and MMA-IL looks good but MMA-H has poor performance.

First I'd suggest looking at this PR I made #4247 which includes fixes for some breaking bugs in the current fairseq code.

I assume you are working on SimulMT, so you'd need the text-to-text agent here. Next, if your target language is not Japanese (Ja), for example German (De) with sentencepiece subwords, then you need to update the def units_to_segment(self, units, states): method here to merge the subwords. Here is my routine for merging:

    def units_to_segment(self, unit_queue, states):
        """
        queue: stores bpe tokens.
        server: accept words.

        Therefore, we need merge subwords into word. we find the first
        subword that starts with BOW_PREFIX, then merge with subwords
        prior to this subword, remove them from queue, send to server.
        """

        # Merge sub word to full word.
        tgt_dict = self.dict["tgt"]

        # if segment starts with eos, send EOS
        if tgt_dict.eos() == unit_queue[0]:
            return DEFAULT_EOS

        # if force finish, there will be None's
        segment = []
        if None in unit_queue.value:
            unit_queue.value.remove(None)

        src_len = len(states.units.source)
        if (
            (len(unit_queue) > 0 and tgt_dict.eos() == unit_queue[-1])
            or len(states.units.target) > self.max_len
        ):
            hyp = tgt_dict.string(
                unit_queue,
                "sentencepiece",
            )
            if self.pre_tokenizer is not None:
                hyp = self.pre_tokenizer.decode(hyp)
            return [hyp] + [DEFAULT_EOS]

        for index in unit_queue:
            token = tgt_dict.string([index])
            if token.startswith(BOW_PREFIX):
                if len(segment) == 0:
                    segment += [token.replace(BOW_PREFIX, "")]
                else:
                    for j in range(len(segment)):
                        unit_queue.pop()

                    string_to_return = ["".join(segment)]

                    if tgt_dict.eos() == unit_queue[0]:
                        string_to_return += [DEFAULT_EOS]

                    return string_to_return
            else:
                segment += [token.replace(BOW_PREFIX, "")]

        return None

You may also need to set self.pre_tokenizer to task.pre_tokenizer. (This would matter if you also use moses tokenizer).

After setting up the agent, and installing SimulEval as in the EnJa guide, you can evaluate by:

simuleval \
    --source ${SRC_FILE} \
    --target ${TGT_FILE} \
    --data-bin ${DATA_BIN} \
    --sacrebleu-tokenizer 13a \
    --eval-latency-unit word \
    --src-splitter-path ${SRC_SPM_PATH} \
    --agent simul_t2t_enja.py \
    --model-path ${SAVE_DIR}/${CHECKPOINT_FILENAME} \
    --output ${OUTPUT} \
    --scores

EricLina commented 2 years ago

Hello, Zhang @George0828Zhang I've tried to use the detokenized dataset to evaluate it and finally I get a result ,however which is very bad . To reproduce , First I prepare the data with mose and sentencepiece on WMT14 en-de dataset , I followed the enja example and trained my model with wait-k for about 50 epoch

for Evaluate I tried use raw data(detokenized) and enja.agent .

this is part of my instance.log while evaluate the model

"index": 2734 "prediction": "Das Kom ite e empf ahl jedoch die FA A - Z ul ass ungs befug nis , die Pil oten bei der An land ung von Ger \u00e4ten in geringer Sicht zu bestellen . " "prediction_length": 36, "reference": "Das Komitee empfahl der FAA allerdings , es Piloten zu erlauben , bei Instrumentenlandung unter schlechten Sichtverh\u00e4ltnissen das Abschalten der Ger\u00e4te durch die Passagiere anzuordnen .", "source": "However , the committee recommended the FAA allow pilots to order passengers to shut off devices during instrument landings in low visibility .", "source_length": 23, "reference_length": 26, "metric": {"sentence_bleu": 1.5342333164810604, "latency": {"AL": 2.0740742683410645, "AP": 0.6759259104728699, "DAL": 5.527776718139648}}}

and the score is also very poor : The AL is neg ... that is strange but I don't know why did it happen

   {

    "Quality": {

        "BLEU": 7.895211793562936
    },
    "Latency": {

        "AL": -0.15109748336804218,

        "AP": 0.49805060968859205,

        "DAL": 3.3051336411581365

    }

So My Question is :

What is the standard data preparation steps ?
In the instance.log , Is my words in predict and reference right ?

Thank you !

EricLina commented 2 years ago

I find the reason! that is because I use the units_to_segment from the enja.agent , actually , I should use yours:

    def units_to_segment(self, unit_queue, states):
        """
        queue: stores bpe tokens.
        server: accept words.

        Therefore, we need merge subwords into word. we find the first
        subword that starts with BOW_PREFIX, then merge with subwords
        prior to this subword, remove them from queue, send to server.
        """

        # Merge sub word to full word.
        tgt_dict = self.dict["tgt"]

        # if segment starts with eos, send EOS
        if tgt_dict.eos() == unit_queue[0]:
            return DEFAULT_EOS

        # if force finish, there will be None's
        segment = []
        if None in unit_queue.value:
            unit_queue.value.remove(None)

        src_len = len(states.units.source)
        if (
            (len(unit_queue) > 0 and tgt_dict.eos() == unit_queue[-1])
            or len(states.units.target) > self.max_len
        ):
            hyp = tgt_dict.string(
                unit_queue,
                "sentencepiece",
            )
            if self.pre_tokenizer is not None:
                hyp = self.pre_tokenizer.decode(hyp)
            return [hyp] + [DEFAULT_EOS]

        for index in unit_queue:
            token = tgt_dict.string([index])
            if token.startswith(BOW_PREFIX):
                if len(segment) == 0:
                    segment += [token.replace(BOW_PREFIX, "")]
                else:
                    for j in range(len(segment)):
                        unit_queue.pop()

                    string_to_return = ["".join(segment)]

                    if tgt_dict.eos() == unit_queue[0]:
                        string_to_return += [DEFAULT_EOS]

                    return string_to_return
            else:
                segment += [token.replace(BOW_PREFIX, "")]

        return None

EricLina commented 2 years ago

Hi, Yes I'm trying to reproduce MMA on SimulST, and so far waitk and MMA-IL looks good but MMA-H has poor performance.

Hello , Zhang ，MMA-H seems wrong while geneating , it won't generate eos () and the result is poor .Have you found out the solution?

EricLina commented 2 years ago

By the way , How many epochs did you train for MMA-hard and MMA-waitk ?

George0828Zhang commented 2 years ago

Hi, Yes I'm trying to reproduce MMA on SimulST, and so far waitk and MMA-IL looks good but MMA-H has poor performance.

Hello , Zhang ，MMA-H seems wrong while geneating , it won't generate eos () and the result is poor .Have you found out the solution?

I reverted the expected alignment to earlier version and it is working fine now.

By the way , How many epochs did you train for MMA-hard and MMA-waitk ?

I'm training s2t so it might be different. But training for 20k steps is good already. Note that larger batch size is very important the batch size given in examples is not enough IMO. You should definitely use --update-frequency with 4 or 8, or directly increase the batch. Also, if the gradient is unstable, use lower lr e.g. 1e-4 or something.

George0828Zhang commented 2 years ago

On my PR, revert the monotonic_attention.py should work fine. I'll add this change to my PR later.

George

On Fri, Mar 18, 2022 at 4:10 PM Eric Liang @.***> wrote:

I reverted the expected alignment to earlier version https://github.com/pytorch/fairseq/blob/98d638c70cdbe751153c10fc571c34beac228347/examples/simultaneous_translation/utils/monotonic_attention.py and it is working fine now. I think I should confirm it . Just revert the /monotonic_attention.py on your PR https://github.com/George0828Zhang/fairseq/tree/fix_mma ? Or git checkout the whole fairseq to 98d638c https://github.com/pytorch/fairseq/commit/98d638c70cdbe751153c10fc571c34beac228347 ?

— Reply to this email directly, view it on GitHub https://github.com/pytorch/fairseq/issues/4085#issuecomment-1072124503, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADHFODYO4IVH5JKYTMWYHZ3VAQ26HANCNFSM5KMAPI6Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>