bnosac / udpipe

R package for Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing Based on the UDPipe Natural Language Processing Toolkit
https://bnosac.github.io/udpipe/en
Mozilla Public License 2.0
214 stars 33 forks source link

Fatal error: " external pointer is not valid" #5

Closed espenjutte closed 7 years ago

espenjutte commented 7 years ago

When running the example-code for udpipe i get the following error:

Error in udp_tokenise_tag_parse(object$model, x, doc_id, tokenizer, tagger, : external pointer is not valid

Steps to reproduce: library(udpipe) dl <- udpipe_download_model(language = "dutch") dl udmodel_dutch <- udpipe_load_model(file = "dutch-ud-2.0-170801.udpipe") x <- udpipe_annotate(udmodel_dutch, x = "Ik ging op reis en ik nam mee: mijn laptop, mijn zonnebril en goed humeur.") x <- as.data.frame(x) x

I'm using Microsoft R Open - R version 3.4.0 (2017-04-21).

jwijffels commented 7 years ago

Did you download the model? It looks like the file "dutch-ud-2.0-170801.udpipe" is not on your computer. It is in your current working directory (what does list.files(getwd()) show you?

FYI. That code works perfectly on my machine and all CRAN machines:

> library(udpipe)
Warning message:
package ‘udpipe’ was built under R version 3.4.2 
> dl <- udpipe_download_model(language = "dutch")
trying URL 'https://github.com/jwijffels/udpipe.models.ud.2.0/raw/master/inst/udpipe-ud-2.0-170801/dutch-ud-2.0-170801.udpipe'
Content type 'application/octet-stream' length 19992491 bytes (19.1 MB)
downloaded 19.1 MB

> dl
  language
1    dutch
                                                                                                 file_model
1 \\\\stud-home.icts.kuleuven.be/k0014536/Desktop/R_Statistical_Machine_Learning/dutch-ud-2.0-170801.udpipe
> udmodel_dutch <- udpipe_load_model(file = "dutch-ud-2.0-170801.udpipe")
> x <- udpipe_annotate(udmodel_dutch,
+                      x = "Ik ging op reis en ik nam mee: mijn laptop, mijn zonnebril en goed humeur.")
> x <- as.data.frame(x)
> x
   doc_id paragraph_id sentence_id                                                                   sentence
1    doc1            1           1 Ik ging op reis en ik nam mee: mijn laptop, mijn zonnebril en goed humeur.
2    doc1            1           1 Ik ging op reis en ik nam mee: mijn laptop, mijn zonnebril en goed humeur.
3    doc1            1           1 Ik ging op reis en ik nam mee: mijn laptop, mijn zonnebril en goed humeur.
4    doc1            1           1 Ik ging op reis en ik nam mee: mijn laptop, mijn zonnebril en goed humeur.
5    doc1            1           1 Ik ging op reis en ik nam mee: mijn laptop, mijn zonnebril en goed humeur.
6    doc1            1           1 Ik ging op reis en ik nam mee: mijn laptop, mijn zonnebril en goed humeur.
7    doc1            1           1 Ik ging op reis en ik nam mee: mijn laptop, mijn zonnebril en goed humeur.
8    doc1            1           1 Ik ging op reis en ik nam mee: mijn laptop, mijn zonnebril en goed humeur.
9    doc1            1           1 Ik ging op reis en ik nam mee: mijn laptop, mijn zonnebril en goed humeur.
10   doc1            1           1 Ik ging op reis en ik nam mee: mijn laptop, mijn zonnebril en goed humeur.
11   doc1            1           1 Ik ging op reis en ik nam mee: mijn laptop, mijn zonnebril en goed humeur.
12   doc1            1           1 Ik ging op reis en ik nam mee: mijn laptop, mijn zonnebril en goed humeur.
13   doc1            1           1 Ik ging op reis en ik nam mee: mijn laptop, mijn zonnebril en goed humeur.
14   doc1            1           1 Ik ging op reis en ik nam mee: mijn laptop, mijn zonnebril en goed humeur.
15   doc1            1           1 Ik ging op reis en ik nam mee: mijn laptop, mijn zonnebril en goed humeur.
16   doc1            1           1 Ik ging op reis en ik nam mee: mijn laptop, mijn zonnebril en goed humeur.
17   doc1            1           1 Ik ging op reis en ik nam mee: mijn laptop, mijn zonnebril en goed humeur.
18   doc1            1           1 Ik ging op reis en ik nam mee: mijn laptop, mijn zonnebril en goed humeur.
   token_id     token     lemma  upos                     xpos
1         1        Ik        ik  PRON        Pron|per|1|ev|nom
2         2      ging        ga  VERB V|intrans|ovt|1of2of3|ev
3         3        op        op   ADP                Prep|voor
4         4      reis      reis  NOUN          N|soort|ev|neut
5         5        en        en CCONJ               Conj|neven
6         6        ik        ik  PRON        Pron|per|1|ev|nom
7         7       nam      neem  VERB   V|trans|ovt|1of2of3|ev
8         8       mee       mee   ADV                Adv|deelv
9         9         :         : PUNCT            Punc|dubbpunt
10       10      mijn      mijn  PRON  Pron|bez|1|ev|neut|attr
11       11    laptop    laptop  NOUN          N|soort|ev|neut
12       12         ,         , PUNCT               Punc|komma
13       13      mijn      mijn  PRON  Pron|bez|1|ev|neut|attr
14       14 zonnebril zonnebril  NOUN          N|soort|ev|neut
15       15        en       een CCONJ               Conj|neven
16       16      goed      goed   ADJ    Adj|attr|stell|onverv
17       17    humeur    humeur  NOUN          N|soort|ev|neut
18       18         .         . PUNCT                Punc|punt
                                                                 feats head_token_id      dep_rel deps
1                           Case=Nom|Number=Sing|Person=1|PronType=Prs             2        nsubj <NA>
2  Aspect=Imp|Mood=Ind|Number=Sing|Subcat=Intr|Tense=Past|VerbForm=Fin             0         root <NA>
3                                                         AdpType=Prep             4         case <NA>
4                                                          Number=Sing             2          obj <NA>
5                                                                 <NA>             7           cc <NA>
6                           Case=Nom|Number=Sing|Person=1|PronType=Prs             7        nsubj <NA>
7  Aspect=Imp|Mood=Ind|Number=Sing|Subcat=Tran|Tense=Past|VerbForm=Fin             2         conj <NA>
8                                                         PartType=Vbp             7 compound:prt <NA>
9                                                       PunctType=Colo             2        punct <NA>
10                          Number=Sing|Person=1|Poss=Yes|PronType=Prs            11         nmod <NA>
11                                                         Number=Sing             2        nsubj <NA>
12                                                      PunctType=Comm            11        punct <NA>
13                          Number=Sing|Person=1|Poss=Yes|PronType=Prs            14         nmod <NA>
14                                                         Number=Sing            11        appos <NA>
15                                                                <NA>            17           cc <NA>
16                                                          Degree=Pos            17         amod <NA>
17                                                         Number=Sing            14         conj <NA>
18                                                      PunctType=Peri             2        punct <NA>
              misc
1             <NA>
2             <NA>
3             <NA>
4             <NA>
5             <NA>
6             <NA>
7             <NA>
8    SpaceAfter=No
9             <NA>
10            <NA>
11   SpaceAfter=No
12            <NA>
13            <NA>
14            <NA>
15            <NA>
16            <NA>
17   SpaceAfter=No
18 SpacesAfter=\\n
espenjutte commented 7 years ago

As far as i can see the model downloads correctly and is present in the directory.

list.files(getwd()) lists "dutch-ud-2.0-170801.udpipe" as one of the files in the directory.

Listing the file from the OS also indicates that the .udpipe file is 4.0kb in size. This seems to be rather small for an entire model.

Downloading the model manually seems to do the trick (code runs with expected output). So something with my setup is causing the the udpipe_download_model-command to download wrongly.

jwijffels commented 7 years ago

This looks like you did not download the model. The model is several megabytes in size. Maybe you are behind a proxy/firewall?

Can you show me all the output of what this does on your computer, including possible warnings/errors that you get? dl <- udpipe_download_model(language = "dutch")

espenjutte commented 7 years ago

Running the command just gives me a normal download-progress: % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 190 100 190 0 0 328 0 --:--:-- --:--:-- --:--:-- 328

No warnings or errors. Other downloads are working correctly (for example package downloads).

If i look at the file itself that is downloaded i get: <html><body>You are being <a href="https://raw.githubusercontent.com/jwijffels/udpipe.models.ud.2.0/master/inst/udpipe-ud-2.0-170801/dutch-ud-2.0-170801.udpipe">redirected</a>.</body></html> So i'm guessing redirects are not being followed correctly by curl for some reason.

jwijffels commented 7 years ago

Bizarre. Can you show what this these 4 things do on your computer: Because that is basically what udpipe_download_model does if you want the dutch language model

utils::download.file("https://raw.githubusercontent.com/jwijffels/udpipe.models.ud.2.0/master/inst/udpipe-ud-2.0-170801/dutch-ud-2.0-170801.udpipe", "dutch-ud-2.0-170801.udpipe", mode = "wb")
utils::download.file("https://raw.githubusercontent.com/jwijffels/udpipe.models.ud.2.0/master/inst/udpipe-ud-2.0-170801/dutch-ud-2.0-170801.udpipe", "dutch-ud-2.0-170801.udpipe", mode = "wb", method = "internal")
utils::download.file("https://raw.githubusercontent.com/jwijffels/udpipe.models.ud.2.0/master/inst/udpipe-ud-2.0-170801/dutch-ud-2.0-170801.udpipe", "dutch-ud-2.0-170801.udpipe", mode = "wb", method = "wininet")
utils::download.file("https://raw.githubusercontent.com/jwijffels/udpipe.models.ud.2.0/master/inst/udpipe-ud-2.0-170801/dutch-ud-2.0-170801.udpipe", "dutch-ud-2.0-170801.udpipe", mode = "wb", method = "libcurl")
utils::download.file("https://raw.githubusercontent.com/jwijffels/udpipe.models.ud.2.0/master/inst/udpipe-ud-2.0-170801/dutch-ud-2.0-170801.udpipe", "dutch-ud-2.0-170801.udpipe", mode = "wb", method = "curl")
jwijffels commented 7 years ago

Note to myself. I think the fix to this might be to replace:

https://github.com/jwijffels/udpipe.models.ud.2.0/raw/master with https://raw.githubusercontent.com/jwijffels/udpipe.models.ud.2.0/master in the code of udpipe_download_model

jwijffels commented 7 years ago

I've update the package to download models from https://raw.githubusercontent.com/jwijffels/udpipe.models.ud.2.0/master instead of the link https://github.com/jwijffels/udpipe.models.ud.2.0/raw/master which was apparently redirected to https://raw.githubusercontent.com/jwijffels/udpipe.models.ud.2.0/master

Can you check on your machine if with the latest version this now works and downloads the model which should be several Mb in size.

devtools::install_github("bnosac/udpipe", build_vignettes = TRUE)
library(udpipe)
dl <- udpipe_download_model(language = "dutch")
olichose123 commented 5 years ago

Leaving a note here for others to find: I had the exact same problem, except the error was thrown after I had been using the model for hundreds of thousands of calls; It had clearly downloaded correctly, but simply stopped working. Simply redownloading the model fixed the problem:

init_model <- function(lang = 'french')
{
  udmodel <<- udpipe_download_model(language = lang)  
  udmodel <<- udpipe_load_model(file = udmodel$file_model)  
}

and I was able to get on with the other hundred-thousand set of sentences in my project.

jwijffels commented 5 years ago

Probably you restarted your R session without knowing. The udpipe models are pointers to file on your hard disk. If you restart your R session, that pointer is lost, that is why you need to reload it using udmodel <- udpipe_load_model(file = "/path/to/the/model")

olichose123 commented 5 years ago

That has to be it. Strangely though, the model was still in global memory, as the R session, if it crashed, reloaded to a similar state.

To explain: I had executed a long-running loop overnight. In the morning, my rstudio-server webpage had crashed, but reloading it, everything was fine: the code had successfully finished, the console's content was there, and everything in the global env was in memory. Usually, if the R session had to restart, there's a message mentioning it in the console. It wasn't the case. I tried to launch my loop again for the couple of thousand remaining cases, and this is when I saw the error. The working directory hadn't changed, the model file was still at the same place.

This might be an R problem, and not a udpipe problem. Something might have happened to to the in-memory data server-side and needed a manual reload.

jwijffels commented 5 years ago

Like I said, udpipe models are Rcpp pointers to files on disk. If you restart your R session these pointers are lost, no matter how you restarted (from a crash, just a regular restart, automatically as RStudio does or by reloading an .RData file at startup). You always need to reload a model from disk with udmodel <- udpipe_load_model(file = "/path/to/the/model") if you restart R.

olichose123 commented 5 years ago

My bad, I did not understand that by pointer you meant actual rcpp pointers. I forgot that udpipe is C++ under the hood! Thanks a lot for the help!