UUDigitalHumanitieslab / gretel

GrETEL4 (fork from CCL-KULeuven)
http://gretel.hum.uu.nl
Other
4 stars 2 forks source link

GrETEL parser version differs from SoNaR parser version #290

Open JanOdijk opened 11 months ago

JanOdijk commented 11 months ago

The queries based on the MWE canonical form ' iemand zal de gordiaanse knoop doorhakken' and 'iemand zal de Gordiaanse knoop doorhakken' fail because the parser gives as lemma for 'gordiaanse' en 'Gordiaanse' the form 'gordiaanse' instead of 'gordiaans'.

This is an error of Alpino but should not have to be a problem if the treebank searched has been parsed with the same parser version. However, in the SoNaR treebank the lemma for 'gordiaanse' is 'gordiaans', which suggests that it has been parsed with a different version of the Alpino parser.

Is it known with which version SoNaR has been parsed?

More general., ideally we should add the parser version that has been used for creating a treebank, and call the same version of Alpino for parsing the example or the MWE canonical form, though I understand that this requires a lot of changes.

tijmenbaarda commented 11 months ago

GrETEL 5 has a command to show the Alpino versions of all treebanks together with the currently installed Alpino version for comparison. I am not able to log in on the server currently, but I remember that SoNaR was parsed with a higher version of Alpino than the other corpora. The installed version of Alpino is 1.3, and that is also the version of the other corpora. I believe SoNaR was parsed with version 1.6.

It would be good to match Alpino versions. The main difficulty there is that GrETEL allows searching multiple treebanks at the same time and that currently the treebanks are only selected after parsing the example sentence or MWE. An easier solution would be to allow the user to select an Alpino version in the first step.