udpipe tokenisation is chunking sentences incorrectly

I am having difficulty in getting udpipe English model to annotate text into correct chunks of sentences. I have attached the raw text (a90.txt) file on which I am running udpipe_annotate

As you see in the next file a90_term.txt the CONLLU file format contains many doc_ids for the same doc. I do not understand why doc_id is getting changed between lines of text.

He has worked in the pharmaceutical business for over 20 years, and been

resident in Frugalia for over 12.

The above two are tagged as two sentences while they are part of the same. The first part is tagged as doc id 4, para 1 sentence 2. The next line is tagged as doc 5, para 1, sentence 1.

Following commands were used to generate the files.

tagger <- udpipe_load_model(file = "english-ud-2.0-170801.udpipe") udpipe_annotate(object = tagger,x = a90_facts) %>% as.data.table ....

where a90_facts is the object containing the raw character vector. Same vector is dumped in the file a90.txt (attached) a90_term.txt a90.txt

CoNLL-UD-2017 / UFAL-UDPipe-1.2

udpipe tokenisation is chunking sentences incorrectly #1