Unable to use model for inference

TarunTater commented 3 years ago

Hi, i am trying to follow indictrans_fairseq_inference.ipynb for inference using pretrained models for english to hindi. But the generated output file is empty. On running the command : bash joint_translate.sh en_sentences.txt hi_outputs.txt 'en' 'hi' '../en-indic' , the following logs show up :

Wed Aug 18 12:03:02 EDT 2021 Applying normalization and script conversion 100%|#######################################################################################################################################################| 4/4 [00:00<00:00, 35.06it/s] Number of sentences in input: 4 Applying BPE Decoding Extracting translations, script conversion and detokenization Translation completed

However, when I see the hi_outputs.txt.log :

2021-08-18 12:06:40 | INFO | fairseq.tasks.translation | [SRC] dictionary: 32104 types
2021-08-18 12:06:40 | INFO | fairseq.tasks.translation | [TGT] dictionary: 35848 types
2021-08-18 12:06:40 | INFO | fairseq_cli.interactive | loading model(s) from ../en-indic/model/checkpoint_best.pt
2021-08-18 12:06:54 | INFO | fairseq_cli.interactive | Sentence buffer size: 2500
2021-08-18 12:06:54 | INFO | fairseq_cli.interactive | NOTE: hypothesis and token scores are output in base 2
2021-08-18 12:06:54 | INFO | fairseq_cli.interactive | Type the input sentence and press return:
S-0 __src__en__ __tgt__hi__ hello
W-0 0.342   seconds
inside decode_fn
Traceback (most recent call last):
  File "/u/ttater24/miniconda3/envs/indictrans/bin/fairseq-interactive", line 33, in <module>
    sys.exit(load_entry_point('fairseq', 'console_scripts', 'fairseq-interactive')())
  File "/dccstor/cssblr/rmurthyv/MWE/indicTrans/training/fairseq/fairseq_cli/interactive.py", line 317, in cli_main
    distributed_utils.call_main(convert_namespace_to_omegaconf(args), main)
  File "/dccstor/cssblr/rmurthyv/MWE/indicTrans/training/fairseq/fairseq/distributed/utils.py", line 369, in call_main
    main(cfg, **kwargs)
  File "/dccstor/cssblr/rmurthyv/MWE/indicTrans/training/fairseq/fairseq_cli/interactive.py", line 283, in main
    print("H-{}\t{}\t{}".format(id_, score, hypo_str)) 
UnicodeEncodeError: 'ascii' codec can't encode characters in position 24-30: ordinal not in range(128)

I get this error, and if I comment out the print lines on line 283 and line 285 in fairseq/fairseq_cli/interactive.py, it does not show any error but the output file still comes out empty.

logs if i comment out the print statements :

2021-08-18 12:10:33 | INFO | fairseq.tasks.translation | [SRC] dictionary: 32104 types
2021-08-18 12:10:33 | INFO | fairseq.tasks.translation | [TGT] dictionary: 35848 types
2021-08-18 12:10:33 | INFO | fairseq_cli.interactive | loading model(s) from ../en-indic/model/checkpoint_best.pt
2021-08-18 12:10:49 | INFO | fairseq_cli.interactive | Sentence buffer size: 2500
2021-08-18 12:10:49 | INFO | fairseq_cli.interactive | NOTE: hypothesis and token scores are output in base 2
2021-08-18 12:10:49 | INFO | fairseq_cli.interactive | Type the input sentence and press return:
S-0 __src__en__ __tgt__hi__ hello
W-0 0.341   seconds
inside decode_fn
P-0 -2.3383 -0.0889 -0.4440
S-1 __src__en__ __tgt__hi__ This bicycle is too small for you ! !
W-1 0.341   seconds
inside decode_fn
P-1 -0.4985 -0.4906 -0.0768 -0.7661 -0.2664 -0.4335 -0.2313 -0.1962 -0.2364 -0.7059
S-2 __src__en__ __tgt__hi__ I will directly meet you at the airport .
W-2 0.341   seconds
inside decode_fn
P-2 -0.7890 -1.7070 -0.8908 -0.2296 -0.8053 -1.8046 -0.3992 -0.1925 -0.3338 -0.1554
S-3 __src__en__ __tgt__hi__ If COVID-19 is spreading in your community , stay safe by taking some simple precautions , such as physical distancing , wearing a mask , keeping rooms well ventilated , avoiding crowds , cleaning your hands , and coughing into a bent elbow or tissue
W-3 0.341   seconds
inside decode_fn
P-3 -1.0445 -0.2641 -0.2060 -0.1771 -0.4343 -0.1578 -0.1135 -1.0917 -0.1419 -0.1706 -0.6793 -0.1694 -0.9022 -1.4178 -1.5907 -0.0508 -0.1238 -0.6169 -0.8115 -0.8431 -0.0594 -1.6192 -0.1850 -2.0033 -0.6672 -0.1500 -0.1400 -0.2116 -0.1118 -0.1452 -1.0625 -0.0479 -0.1315 -0.3330 -1.7769 -0.2528 -0.2888 -0.3009 -0.0194 -0.4282 -0.1504 -0.1208 -0.2184 -0.2383 -0.0699 -0.1597 -0.3872 -0.2935 -0.4527 -0.5734 -0.1968 -0.3366 -0.5575 -0.0954 -0.3563 -0.9439 -0.8385 -1.2941 -0.7543 -1.0392 -0.1206 -2.6353 -0.5275 -0.2655 -0.1371 -0.4036 -0.5091
2021-08-18 12:10:51 | INFO | fairseq_cli.interactive | Total time: 18.184 seconds; translation time: 1.364

In this case, the consolidated_testoutput in postprocess_translate.py is : ['', '', '', ''] I am unable to understand why the output is an empty file and how to use the model for inference

gowtham1997 commented 3 years ago

I'm not able to reproduce this issue. I just reran inference colab notebook and tested on the latest version of fairseq to check if that's breaking something but everything seems to be working fine.

Can you help us by either providing the inputs that you think are causing the error in en-hi translation or a colab notebook reproducing this error?

AdityaSoni19031997 commented 3 years ago

I am not sure but why does the code needs 16 GB of GPU Memory to translate 18k sentences and it crashes due to GPU OOM. Not sure but it looks like it's super inefficient for some reason. NB i didn't do model.eval() explicitly because i have assumed that it's being already done.

It almost hits the GPU capacity even for 4k sentences as well. Plus, lot of translations results into just "\n" as the output as well.

Ref -> https://www.kaggle.com/adityaecdrid/translate-them-to-tamil-language-external-data. [If you will remove the sample 2**10 that I am passing while reading a csv, you should see the same)

Secondly, It would be nice if the results are written as they are computed, not in bulk as a one shot activity?

If your code crashed due to OOM, the output file will be empty. (look into the log file)

gowtham1997 commented 3 years ago

@AdityaSoni19031997 It looks like you are initializing the model multiple times (as you are using both command-line interface and the python interface to load the models)

# load the tranlation model from that directory
from indicTrans.inference.engine import Model # because of this import, we have to do cd...
en2indic_model = Model(expdir='/kaggle/working/en-indic')
en2indic_model

^ this is where you are using the python interface. With python interface, you can load the model onto GPU and do batch translation or paragraph translation (see the attached picture below or the colab notebook here )

Secondly, It would be nice if the results are written as they are computed, not in bulk as a one-shot activity?

The batch_translate or paragraph_translate method can help with this as you are translating one batch/paragraph at a time and storing the results.

! ./joint_translate.sh en_paragraphs.txt ta_paragraphs.txt "en" "ta" '../en-indic'

^ here, you are using the command-line interface, which loads the model again to the GPU, translates a text file in bulk and then offloads the model.

NB i didn't do model.eval() explicitly because i have assumed that it's being already done.

Yes, this is automatically handled in fairseq-interactive's prepare_model_for_inference method (this function calls make_generation_fast_ which sets model to eval mode). In both our interfaces, we internally use fairseq-interactive (our command line interface directly calls fairseq interactive, and for the python interface we provide a wrapper around fairseq-interactive)

AdityaSoni19031997 commented 3 years ago

Even we we don't init the model twice, the GPU consumption is quite high, Feel free to fork and take a look! Thanks for the snips. Maybe will lower the batch-size and enable fp16 and see how it's.

Will debug a bit more and get back! Thanks for the pointers.

FYR @gowtham1997 , (the below is the stats when we are using joint_translate shell snip as-is)

gowtham1997 commented 3 years ago

Even we don't init the model twice, the GPU consumption is quite high, Feel free to fork and take a look! Thanks for the snips. Maybe will lower the batch size and enable fp16 and see how it's.

The model is a 434M parameter model (4 times the size of the base transformer model), so I think the high GPU consumption is expected if you are running it on 16 GB GPU and non-fp16 mode with batch sizes >=64.

Please tune both batch_size and buffer_size in this line before running joint_translate to see if that helps

Do let us know if you find something else that is causing high GPU consumption that we missed to optimize.

TarunTater commented 3 years ago

@gowtham1997 - it was an issue with our terminal language setting, after a lot of debugging, we figured this simple command helped us : export PYTHONIOENCODING=UTF-8 since it was just a printing issue. Thanks for your help. I am not sure about the other discussion going on but i think thats not relevant to our issue. If its okay with you, i will close this issue ?

gowtham1997 commented 3 years ago

Sure. Thanks for the update. You can close this issue.

@AdityaSoni19031997 open a separate issue if you find something wrt GPU utilization that we are missing ( I still think the high gpu utilization is due to the model size and high batch size)

AI4Bharat / indicTrans

Unable to use model for inference #20