bnosac / udpipe

R package for Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing Based on the UDPipe Natural Language Processing Toolkit
https://bnosac.github.io/udpipe/en
Mozilla Public License 2.0
214 stars 33 forks source link

Update the udpipe model #121

Open mayeulk opened 1 year ago

mayeulk commented 1 year ago

The most recent english-ewt-ud updpipe model accessible from the R package is 2.5: english-ewt-ud-2.5-191206.udpipe Is it due to incompatibility of 2.6 and later versions with the R package? (models>2.5 are for Udpipe 2). Can English versions 2.6 or later be used? I found issues in 2.5 that are corrected in later versions: For instance, the lemma for token "whistle" is "whisle" (without "t"). Checking with http://lindat.mff.cuni.cz/services/udpipe/ , v. 2.6 correctly returns the lemma "whistle".

jwijffels commented 1 year ago

You can train your own models on more recent data from universal dependencies with this R package. These models are 'udpipe 1' models and you can train them on any version of data of universal dependencies. Documentation of how to do that is put at

jwijffels commented 1 year ago

You can train your own models on more recent data from universal dependencies with this R package. Documentation of how to do that is put at

mayeulk commented 1 year ago

Thank you for this! (I was hoping to be able to use more recent pre-trained model out of the box).

locusclassicus commented 2 months ago

Hi, many thanks for your package. I have the same question concerning the Latin models. Training a model on one's own requires rather advanced NLP-skills, so it would really be useful to have newer (2.13 or 2.14) models pretrained, if possible.