bnosac / udpipe

R package for Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing Based on the UDPipe Natural Language Processing Toolkit
https://bnosac.github.io/udpipe/en
Mozilla Public License 2.0
209 stars 33 forks source link

how to read large annotate file into data.frame? #57

Closed findgit123 closed 5 years ago

findgit123 commented 5 years ago

Dear Jan Wijffels and BNOSAC,

udpipe is great. But i can not read big annotate file into data frame. As the example shows, i read RDS file but can not read into data.frame. Sometimes it shows fails in coerce to data.frame. Sometimes the machine freezes. Can you help solve read large file into data frame.

Example code:

this annotate file is large, about 146M. It takes about 3 minutes to read into R.

it is ok though it takes several minutes

s=readRDS(file="F:\annotate.rds")

but it is not successful to coerce into data.frame if the file is large, e.g. 146 M

If annotate file is below 80 M in size, it may be read. But this may not always be successful.

If the file is below 50 M in size, it takes about 5 minutes to read into data.frame.

x=data.frame(s)

I am on windows 7. cpu 2.30G, ram 6G, R 3.60, RStudio Version 1.2.1335 build 2009-2019. and use udpipe 0.8.3.

Can you let me know how to read big file in x=data.frame(s)

Thank you very much.

jwijffels commented 5 years ago

What is in that rds file? What is the class of it, and the dimension of it; as in class(s) and str(s)

jwijffels commented 5 years ago

Closing as no further response. The class of your object s is probably already a data.frame so it does not make sense to use data.frame(s). My advise on this would be, get a computer with more RAM, keep only the columns you need from the annotation object, put in a database or use data.table. This is not particularly related to this R package but related to your own R usage.

rboga commented 4 years ago

Hi I am facing the same problem. Basically udpipe connlu object when saved into rds cannot be then read as rds and then coerced into dataframe. To save it as dataframe, we have to save udpipe object directly as dataframe. Is there any way to convert a rds file having an udpipe object into dataframe?

jwijffels commented 4 years ago

Can you provide a reproducible example of your issue