AIPHES / ACL20-Reference-Free-MT-Evaluation

Reference-free MT Evaluation Metrics
Apache License 2.0
20 stars 5 forks source link

some issue in conde execution #6

Open mjavedgohar opened 1 year ago

mjavedgohar commented 1 year ago

Hi @andyweizhao,

I am trying to execute the demo.py and main.py scripts but getting the following error in google colab notebook. can you please help me resolve this? and how can I test my own translation dataset?

image


TypeError Traceback (most recent call last)

in () 5 lp='cs-en' 6 ----> 7 xmoverscores = scorer.compute_xmoverscore(alignment, projection, bias, source, translations) 8 #, ngram=ngram, layer=layer, dropout_rate=dropout_rate, bs=batch_size) 9 2 frames /content/score_utils_2.py in get_bert_embedding(all_sens, model, tokenizer, idf_dict, batch_size, device) 52 batch_embedding = bert_encode(model, padded_sens[i:i+batch_size], 53 attention_mask=mask[i:i+batch_size]) ---> 54 batch_embedding = torch.stack(batch_embedding) 55 embeddings.append(batch_embedding) 56 del batch_embedding TypeError: stack(): argument 'tensors' (position 1) must be tuple of Tensors, not str Thanks
andyweizhao commented 1 year ago

Hi, the reason for that issue is that Google Colab environment uses a newer version of the Transformer library that is incompatible to the code.

You could adapt the code to the newer version by making the following changes at https://github.com/AIPHES/ACL20-Reference-Free-MT-Evaluation/blob/master/score_utils.py#L20

From:

L20:     output, _, x_encoded_layers, _ = model(input_ids = x, token_type_ids = None, attention_mask = attention_mask)
L21:  return x_encoded_layers

To:

L20:     output = model(input_ids = x, token_type_ids = None, attention_mask = attention_mask)
L21:  return output. hidden_states

Regarding your own datasets, you could arrange the datasets into the format like this, and then change the data_path in demo.py and run it.

Hopefully this helps.

mjavedgohar commented 1 year ago

@andyweizhao Thanks for your help. It worked for me. Can you please guide me for the steps to train for any language that is not on the current list e.g. Hindi, Arabic etc? Can you please also explain the .BAM & .GBDD files with layers 8 and 12?

Thanks