bnosac / udpipe

R package for Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing Based on the UDPipe Natural Language Processing Toolkit
https://bnosac.github.io/udpipe/en
Mozilla Public License 2.0
209 stars 33 forks source link

problem - Error in udp_tokenise_tag_parse #95

Closed sebacom8 closed 3 years ago

sebacom8 commented 3 years ago

Dear,

this problem suddently appeared while working on a project and I m unable to solve it, can you please help me?

eco1 <- eco_2003 %>%

  • udpipe(object = "italian") This looks like you restarted your R session which has invalidated the model object, trying now to reload the model again from the file at C:/Seba/Documenti/Università/Erasmus/2021 - Mosca/Linguistic data/Final work/Seba_proj/italian-isdt-ud-2.5-191206.udpipe in order to do the annotation. Error in udp_tokenise_tag_parse(object$model, x, doc_id, tokenizer, tagger, : external pointer is not valid

library installed are tidyverse, udpipe, rdracor

jwijffels commented 3 years ago

download the model again udpipe_download_model("italian") + restart a clean R session

sebacom8 commented 3 years ago

already done it, doesn't work...

sebacom8 commented 3 years ago

I've also created new projects and new files in different folders, uninstalled and reinstalled R too. No changes

jwijffels commented 3 years ago

What is the output of this code on your system

dl <- udpipe_download_model("italian")
dl
model <- udpipe_load_model(dl$file_model)
eco1 <- eco_2003 %>% udpipe(object = model)
sebacom8 commented 3 years ago
> dl <- udpipe_download_model("italian")
Downloading udpipe model from https://raw.githubusercontent.com/jwijffels/udpipe.models.ud.2.5/master/inst/udpipe-ud-2.5-191206/italian-isdt-ud-2.5-191206.udpipe to C:/Seba/Documenti/Università/Erasmus/2021 - Mosca/Linguistic data/Final work/Seba_proj/italian-isdt-ud-2.5-191206.udpipe
 - This model has been trained on version 2.5 of data from https://universaldependencies.org
 - The model is distributed under the CC-BY-SA-NC license: https://creativecommons.org/licenses/by-nc-sa/4.0
 - Visit https://github.com/jwijffels/udpipe.models.ud.2.5 for model license details.
 - For a list of all models and their licenses (most models you can download with this package have either a CC-BY-SA or a CC-BY-SA-NC license) read the documentation at ?udpipe_download_model. For building your own models: visit the documentation by typing vignette('udpipe-train', package = 'udpipe')
provo con l'URL 'https://raw.githubusercontent.com/jwijffels/udpipe.models.ud.2.5/master/inst/udpipe-ud-2.5-191206/italian-isdt-ud-2.5-191206.udpipe'
Content type 'application/octet-stream' length 19298761 bytes (18.4 MB)
downloaded 18.4 MB

Downloading finished, model stored at 'C:/Seba/Documenti/Università/Erasmus/2021 - Mosca/Linguistic data/Final work/Seba_proj/italian-isdt-ud-2.5-191206.udpipe'
> dl
      language
1 italian-isdt
                                                                                                                file_model
1 C:/Seba/Documenti/Università/Erasmus/2021 - Mosca/Linguistic data/Final work/Seba_proj/italian-isdt-ud-2.5-191206.udpipe
                                                                                                                                  url
1 https://raw.githubusercontent.com/jwijffels/udpipe.models.ud.2.5/master/inst/udpipe-ud-2.5-191206/italian-isdt-ud-2.5-191206.udpipe
  download_failed download_message
1           FALSE               OK
> model <- udpipe_load_model(dl$file_model)
> eco1 <- eco_2003 %>% udpipe(object = model)
This looks like you restarted your R session which has invalidated the model object, trying now to reload the model again from the file at C:/Seba/Documenti/Università/Erasmus/2021 - Mosca/Linguistic data/Final work/Seba_proj/italian-isdt-ud-2.5-191206.udpipe in order to do the annotation.
Error in udp_tokenise_tag_parse(object$model, x, doc_id, tokenizer, tagger,  : 
  external pointer is not valid
jwijffels commented 3 years ago

I can't reproduce this behaviour. What I can advise you is to start from a clean R session, which does not have the udpipe package loaded for whatever reason you have in your setup, probably because you restarted your R session (maybe without knowing that you even did that) Make sure there is no .RData file which loads some unexpected things you did before, make sure your eco_2003 data is a data.frame with columns doc_id and text.

> library(udpipe)
> library(magrittr)
> data(brussels_reviews, package = "udpipe")
> eco_2003 <- data.frame(doc_id = brussels_reviews$id, 
+                        text = brussels_reviews$feedback, stringsAsFactors = FALSE)
> eco_2003 <- head(eco_2003, n = 10)
> 
> dl    <- udpipe_download_model("italian")
Downloading udpipe model from https://raw.githubusercontent.com/jwijffels/udpipe.models.ud.2.5/master/inst/udpipe-ud-2.5-191206/italian-isdt-ud-2.5-191206.udpipe to C:/Users/Jan/Dropbox/Work/RForgeBNOSAC/VUB/HTR-tests/italian-isdt-ud-2.5-191206.udpipe
 - This model has been trained on version 2.5 of data from https://universaldependencies.org
 - The model is distributed under the CC-BY-SA-NC license: https://creativecommons.org/licenses/by-nc-sa/4.0
 - Visit https://github.com/jwijffels/udpipe.models.ud.2.5 for model license details.
 - For a list of all models and their licenses (most models you can download with this package have either a CC-BY-SA or a CC-BY-SA-NC license) read the documentation at ?udpipe_download_model. For building your own models: visit the documentation by typing vignette('udpipe-train', package = 'udpipe')
trying URL 'https://raw.githubusercontent.com/jwijffels/udpipe.models.ud.2.5/master/inst/udpipe-ud-2.5-191206/italian-isdt-ud-2.5-191206.udpipe'
Content type 'application/octet-stream' length 19298761 bytes (18.4 MB)
downloaded 18.4 MB

Downloading finished, model stored at 'C:/Users/Jan/Dropbox/Work/RForgeBNOSAC/VUB/HTR-tests/italian-isdt-ud-2.5-191206.udpipe'
> str(dl)
'data.frame':   1 obs. of  5 variables:
 $ language        : chr "italian-isdt"
 $ file_model      : chr "C:/Users/Jan/Dropbox/Work/RForgeBNOSAC/VUB/HTR-tests/italian-isdt-ud-2.5-191206.udpipe"
 $ url             : chr "https://raw.githubusercontent.com/jwijffels/udpipe.models.ud.2.5/master/inst/udpipe-ud-2.5-191206/italian-isdt-"| __truncated__
 $ download_failed : logi FALSE
 $ download_message: chr "OK"
> model <- udpipe_load_model(dl$file_model)
> model
$file
[1] "C:/Users/Jan/Dropbox/Work/RForgeBNOSAC/VUB/HTR-tests/italian-isdt-ud-2.5-191206.udpipe"

$model
<pointer: 0x000000a4398f6490>

attr(,"class")
[1] "udpipe_model"
> eco1  <- eco_2003 %>% udpipe(object = model)
> str(eco1)
'data.frame':   692 obs. of  17 variables:
 $ doc_id       : chr  "32198807" "32198807" "32198807" "32198807" ...
 $ paragraph_id : int  1 1 1 1 1 1 1 1 1 1 ...
 $ sentence_id  : int  1 1 1 1 1 1 2 2 2 2 ...
 $ sentence     : chr  "Gwen fue una magnifica anfitriona." "Gwen fue una magnifica anfitriona." "Gwen fue una magnifica anfitriona." "Gwen fue una magnifica anfitriona." ...
 $ start        : int  1 6 10 14 24 34 36 39 46 49 ...
 $ end          : int  4 8 12 22 33 34 37 44 47 50 ...
 $ term_id      : int  1 2 3 4 5 6 7 8 9 10 ...
 $ token_id     : chr  "1" "2" "3" "4" ...
 $ token        : chr  "Gwen" "fue" "una" "magnifica" ...
 $ lemma        : chr  "Gwen" "fuire" "uno" "magnifica" ...
 $ upos         : chr  "INTJ" "ADV" "DET" "NOUN" ...
 $ xpos         : chr  "I" "B" "RI" "S" ...
 $ feats        : chr  NA NA "Definite=Ind|Gender=Fem|Number=Sing|PronType=Art" "Gender=Fem|Number=Sing" ...
 $ head_token_id: chr  "4" "4" "4" "0" ...
 $ dep_rel      : chr  "case" "advmod" "det" "root" ...
 $ deps         : chr  NA NA NA NA ...
 $ misc         : chr  NA NA NA NA ...
sebacom8 commented 3 years ago

Ok I will. The problem I cannot understand is why on my laptop it's not working but on others (eg the professor one) it is.

jwijffels commented 3 years ago

Right, I got baffled by this behaviour as well.

sebacom8 commented 3 years ago

Many Thanks!!!

IT WORKS NOW!!!