avoid confusion if passing on to udpipe in parallel a data.frame with duplicate doc_id's

bnosac / udpipe

R package for Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing Based on the UDPipe Natural Language Processing Toolkit

https://bnosac.github.io/udpipe/en

Mozilla Public License 2.0

209 stars 33 forks source link

Open jwijffels opened 3 years ago

jwijffels commented 3 years ago

in that case paragraph_id is within chunks of doc_ids across the cores

jwijffels commented 3 years ago

or if no doc_id's are provided, we get duplicate doc_id's in case of parallel annotation