RandyZhouRan / MELM

Code for "MELM: Data Augmentation with Masked Entity Language Modeling for Low-Resource NER"
44 stars 5 forks source link

AssertionError during generation #7

Open ajesujoba opened 1 year ago

ajesujoba commented 1 year ago

I am trying to do generation with an already-trained MELM with your code. Do you by chance know what is responsible for the following error during generation?

Traceback (most recent call last):
  File "generate.py", line 202, in <module>
    assert inlines[in_idx] != '\n'

I was successful at training MELM on the same data but I don't seem to understand why generation fails on the same data.

RandyZhouRan commented 1 year ago

Can you print out the sample that triggers the sample? This could be due to a lengthy sentence that exceeds the maximum sequence length (default=128). Increasing the max_seq_length here might solve the issue. (https://github.com/RandyZhouRan/MELM/blob/main/data_gen.py#L111) def convert_examples_to_features(self, examples, tokenizer, max_seq_length=128):

ajesujoba commented 1 year ago

Thank you @RandyZhouRan! Yes, I can print it out. It is actually a Yoruba sentence from the MasakhaNER dataset. I have increased the max_seq_length to 256 and it still fails

ajesujoba commented 1 year ago

Here's the sample @RandyZhouRan

Pẹ̀lú O
bí O
gbogbo O
nǹkan O
ṣe O
ń O
lọ O
, O
yóò O
fẹ́rẹ̀ẹ́ O
pẹ́ O
díẹ̀ O
kí O
orílẹ̀ O
- O
èdè O
Nàìjíríà B-LOC
ó O
tó O
le O
è O
ní O
Ààrẹ O
tí O
kò O
ní O
nǹkan O
ṣe O
pẹ̀lú O
, O
tàbí O
tí O
kò O
ní O
àtìlẹ́yìn O
‘ẹgbẹ́’ O
àwọn O
ọ̀gágun O
ajagun O
fẹ̀yìntì O
. O
sonalkum commented 1 year ago

@ajesujoba Can you please share a snip of what your train data look like? The issue might be due to multiple line breaks between two consecutive sentences.

ajesujoba commented 1 year ago

@realsonalkumar , attached is what the training data looks like. train.txt

ajesujoba commented 1 year ago

@realsonalkumar @RandyZhouRan it is not due to multiple line breaks. I ensured that there are no multiple line breaks.

xiaohutong commented 10 months ago

请问您解决了这个问题吗 我也遇到了一模一样的问题

xiaohutong commented 10 months ago

我正在尝试使用您的代码使用已经训练好的 MELM 进行生成。您是否偶然知道在生成过程中导致以下错误的原因是什么?

Traceback (most recent call last):
  File "generate.py", line 202, in <module>
    assert inlines[in_idx] != '\n'

我成功地在相同的数据上训练了 MELM,但我似乎不明白为什么在相同的数据上生成失败。

请问您解决了这个问题吗 我也遇到了一模一样的问题