UdifyTextPredictor fails when output_conllu=true

Hyperparticle / udify

A single model that parses Universal Dependencies across 75 languages. Given a sentence, jointly predicts part-of-speech tags, morphology tags, lemmas, and dependency trees.

https://arxiv.org/abs/1904.02099

MIT License

220 stars 56 forks source link

UdifyTextPredictor fails when output_conllu=true #22

Open ranjita-naik opened 3 years ago

ranjita-naik commented 3 years ago

I'm feeding this raw input to the predict.py - "Il est assez sûr de lui pour danser et chanter en public ." by setting --raw_text flag and since I want the output in CoNLLU format, I've set output_conllu=True in UdifyTextPredictor.

The dump_line in UdifyPredictor is erroring out.

File udify/udify/predictors/text_predictor.py", line 63, in dump_line return self.predictor.dump_line(outputs) File udify/udify/predictors/predictor.py", line 82, in dump_line multiword_ids = [[id] + [int(x) for x in id.split("-")] for id in outputs["multiword_ids"]] File udify/udify/predictors/predictor.py", line 82, in multiword_ids = [[id] + [int(x) for x in id.split("-")] for id in outputs["multiword_ids"]] File udify/udify/predictors/predictor.py", line 82, in multiword_ids = [[id] + [int(x) for x in id.split("-")] for id in outputs["multiword_ids"]] ValueError: invalid literal for int() with base 10: 'N'

Could you please take a look?

Thanks, Ranjita

Hyperparticle commented 3 years ago

Sorry for the late reply. I think there might be a bug in how the multiword IDs are handled. In this case, you don't have any multiword IDs because you input raw text. Can you try commenting out the block starting with if outputs["multiword_ids"]:?

huberemanuel commented 3 years ago

I can relate to the same problem, even with the suggested solution the error persists.

gifdog97 commented 2 years ago

I also came across this issue. The problem is that outputs["multiword_ids"] is "None" (str), not None. Due to this, the condition if outputs["multiword_ids"]: is always True even if there's no multiword ids actually. That is, even if there's no multiword in a predicted tree, the following code block is executed, causing Error because it tries to apply int() to string 'N', the first letter of "None".

https://github.com/Hyperparticle/udify/blob/18d63ac1b2da5a1afea58f317ade79bc84910450/udify/predictors/predictor.py#L81-L84

I think the error should be removed by commenting out these four lines.

gifdog97 commented 2 years ago

But actually I found another problem... outputs["ids"] is also "None" (str) somehow, generating weird conllu as a result:

N   Un  uno DET _   Definite=Ind|Gender=Masc|Number=Sing|PronType=Art   2   det _   _
o   oppioide    oppioide    NOUN    _   Gender=Masc|Number=Sing 6   nsubj   _   _
n   è   essere  AUX _   Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin   6   cop _   _
e   un  uno DET _   Definite=Ind|Gender=Masc|Number=Sing|PronType=Art   6   det _   _

We can temporarily fix it by using instead the list with the length of sentence [1,2,...,n], but I think the essential issue is that the outputs['ids'] maps to an unexpected value.. And this might be related to the issue I posted as well (not for sure). Could you check it?