Open angelhsu05 opened 4 years ago
Hi,
I also have the same problem, here what I got when try to run this code
sotu_firsts_nouns <- PrepText(textdata = sotu_firsts, groupvar = "president", textvar = "sotu_text", node_type = "groups", tokenizer = "words", pos = "nouns", remove_stop_words = TRUE, compound_nouns = TRUE)
Downloading udpipe model from https://raw.githubusercontent.com/jwijffels/udpipe.models.ud.2.4/master/inst/udpipe-ud-2.4-190531/english-ewt-ud-2.4-190531.udpipe to /Users/vuongdat/OneDrive - Aarhus universitet/Applied Data Science/Share Data Folder/2. Raw Data/Reddit Data/english-ewt-ud-2.4-190531.udpipe
Visit https://github.com/jwijffels/udpipe.models.ud.2.4 for model license details
trying URL 'https://raw.githubusercontent.com/jwijffels/udpipe.models.ud.2.4/master/inst/udpipe-ud-2.4-190531/english-ewt-ud-2.4-190531.udpipe'
Content type 'application/octet-stream' length 16477964 bytes (15.7 MB)
==================================================
downloaded 15.7 MB
Error in check_input(x) :
Input must be a character vector of any length or a list of character
vectors, each of which has a length of 1.
In addition: Warning message:
'unnest_tokens_' is deprecated.
Use 'unnest_tokens' instead.
See help("Deprecated")
Here is my section Information.
Thank you in advance,
─ Session info ──────────────────────────────────────────────────────────────────────────────────────────────────────────
setting value
version R version 3.6.1 (2019-07-05)
os macOS Catalina 10.15.3
system x86_64, darwin15.6.0
ui RStudio
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz Europe/Copenhagen
date 2020-05-03
─ Packages ──────────────────────────────────────────────────────────────────────────────────────────────────────────────
package * version date lib source
assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.0)
backports 1.1.6 2020-04-05 [1] CRAN (R 3.6.2)
callr 3.4.3 2020-03-28 [1] CRAN (R 3.6.2)
cli 2.0.2 2020-02-28 [1] CRAN (R 3.6.0)
colorspace 1.4-1 2019-03-18 [1] CRAN (R 3.6.0)
crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.0)
curl 4.3 2019-12-02 [1] CRAN (R 3.6.0)
data.table 1.12.8 2019-12-09 [1] CRAN (R 3.6.0)
desc 1.2.0 2018-05-01 [1] CRAN (R 3.6.0)
devtools * 2.2.1 2019-09-24 [1] CRAN (R 3.6.0)
digest 0.6.25 2020-02-23 [1] CRAN (R 3.6.0)
dplyr * 0.8.5 2020-03-07 [1] CRAN (R 3.6.0)
ellipsis 0.3.0 2019-09-20 [1] CRAN (R 3.6.0)
fansi 0.4.1 2020-01-08 [1] CRAN (R 3.6.0)
farver 2.0.3 2020-01-16 [1] CRAN (R 3.6.0)
fs 1.4.1 2020-04-04 [1] CRAN (R 3.6.2)
generics 0.0.2 2018-11-29 [1] CRAN (R 3.6.0)
ggforce 0.3.1 2019-08-20 [1] CRAN (R 3.6.0)
ggplot2 * 3.3.0 2020-03-05 [1] CRAN (R 3.6.0)
ggraph * 2.0.2 2020-03-17 [1] CRAN (R 3.6.0)
ggrepel 0.8.2 2020-03-08 [1] CRAN (R 3.6.0)
glue 1.4.0 2020-04-03 [1] CRAN (R 3.6.2)
graphlayouts 0.7.0 2020-04-25 [1] CRAN (R 3.6.2)
gridExtra 2.3 2017-09-09 [1] CRAN (R 3.6.0)
gtable 0.3.0 2019-03-25 [1] CRAN (R 3.6.0)
htmltools 0.4.0 2019-10-04 [1] CRAN (R 3.6.0)
htmlwidgets 1.5.1 2019-10-08 [1] CRAN (R 3.6.1)
igraph 1.2.5 2020-03-19 [1] CRAN (R 3.6.0)
janeaustenr 0.1.5 2017-06-10 [1] CRAN (R 3.6.0)
lattice 0.20-38 2018-11-04 [1] CRAN (R 3.6.1)
lifecycle 0.2.0 2020-03-06 [1] CRAN (R 3.6.0)
magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.0)
MASS 7.3-51.4 2019-03-31 [1] CRAN (R 3.6.1)
Matrix 1.2-17 2019-03-22 [1] CRAN (R 3.6.1)
memoise 1.1.0 2017-04-21 [1] CRAN (R 3.6.0)
munsell 0.5.0 2018-06-12 [1] CRAN (R 3.6.0)
networkD3 * 0.4 2017-03-18 [1] CRAN (R 3.6.0)
pillar 1.4.3 2019-12-20 [1] CRAN (R 3.6.1)
pkgbuild 1.0.7 2020-04-25 [1] CRAN (R 3.6.2)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 3.6.1)
pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.6.0)
plyr 1.8.6 2020-03-03 [1] CRAN (R 3.6.0)
polyclip 1.10-0 2019-03-14 [1] CRAN (R 3.6.0)
prettyunits 1.1.1 2020-01-24 [1] CRAN (R 3.6.1)
processx 3.4.2 2020-02-09 [1] CRAN (R 3.6.0)
ps 1.3.2 2020-02-13 [1] CRAN (R 3.6.0)
purrr 0.3.4 2020-04-17 [1] CRAN (R 3.6.2)
R6 2.4.1 2019-11-12 [1] CRAN (R 3.6.1)
Rcpp 1.0.4 2020-03-17 [1] CRAN (R 3.6.0)
remotes 2.1.0 2019-06-24 [1] CRAN (R 3.6.0)
reshape2 1.4.4 2020-04-09 [1] CRAN (R 3.6.2)
rlang 0.4.5 2020-03-01 [1] CRAN (R 3.6.0)
rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.6.0)
rstudioapi 0.11 2020-02-07 [1] CRAN (R 3.6.0)
rversions 2.0.1 2019-12-03 [1] CRAN (R 3.6.1)
scales 1.1.0 2019-11-18 [1] CRAN (R 3.6.1)
sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.0)
SnowballC 0.7.0 2020-04-01 [1] CRAN (R 3.6.2)
stringi 1.4.6 2020-02-17 [1] CRAN (R 3.6.0)
stringr 1.4.0 2019-02-10 [1] CRAN (R 3.6.0)
testthat 2.3.2 2020-03-02 [1] CRAN (R 3.6.0)
textnets * 0.1.1 2020-05-03 [1] Github (cbail/textnets@bc688a8)
tibble 3.0.1 2020-04-20 [1] CRAN (R 3.6.2)
tidygraph 1.1.2 2019-02-18 [1] CRAN (R 3.6.0)
tidyr 1.0.2 2020-01-24 [1] CRAN (R 3.6.0)
tidyselect 1.0.0 2020-01-27 [1] CRAN (R 3.6.0)
tidytext * 0.2.4 2020-04-17 [1] CRAN (R 3.6.2)
tokenizers 0.2.1 2018-03-29 [1] CRAN (R 3.6.0)
tweenr 1.0.1 2018-12-14 [1] CRAN (R 3.6.0)
udpipe * 0.8.3 2019-07-05 [1] CRAN (R 3.6.0)
usethis * 1.6.1 2020-04-29 [1] CRAN (R 3.6.2)
vctrs 0.2.4 2020-03-10 [1] CRAN (R 3.6.0)
viridis 0.5.1 2018-03-29 [1] CRAN (R 3.6.0)
viridisLite 0.3.0 2018-02-01 [1] CRAN (R 3.6.0)
withr 2.2.0 2020-04-20 [1] CRAN (R 3.6.2)
xml2 1.2.2 2019-08-09 [1] CRAN (R 3.6.0)
same, even though the data are characters or character lists, doesn't matter which dataset, the same error persists.
it seems like it's an issue with unnest_tokens within the function, there are some help pages but it's not clear how to resolve it. Still looking into it ...
@Dat-Vuong07 @angelhsu05
Did you use dataset from the package?
If you use your own textual data, firstly, please try to load own dataset with the package using read_csv
from tidyverse
instead of read.csv
from base
. read_csv
function automatically identifies the datatype by itself.
However, you might need to double check the textvar of datatype is character or not.
If that is not a problem, please set compound_nouns
off like this compound_nouns = FALSE
.
I guess you probably use a language other than English. The package parse the text using
udpipe
library. The package of udpipe
parses text based on pre-trained language model in CONLL-U format and returns dep_rel
column that stores various features of language dependency. And there is no compound
feature from the data.
Yes, I was using the 'sotu' dataset from the package and so am not reading the data in from csv, but instead loading directly from the package. Changing compound_nouns=FALSE
doesn't work either.
I can use the tidytext
and udpipe
libraries instead of the PrepText
function to get the tidytextobject
to use the other functions, but it would be great to be able to get PrepText
working as i'm new to network analysis.
I am also getting a message that unnest_tokens_
is deprecated and to use unnest_tokens
instead, so i updated PrepText
with that but I'm still getting the same error about the data not being a character vector or list of character vectors, which is clearly is in the sotu
dataset.
Have you checked datatype of the sotu being loaded ? Particularly, the textvar that you are going to use.
Hi @yl17124 ,
I also have the same problem.
Here is my code. I already changed the compound_nouns = FALSE
library(textnets)
data("sotu")
str(sotu)
sotu_firsts <- sotu %>% group_by(president) %>% slice(1L)
sotu_firsts_nouns <- PrepText(sotu_firsts, groupvar = "president",
textvar = "sotu_text",
node_type = "groups",
tokenizer = "words",
pos = "nouns",
remove_stop_words = TRUE, compound_nouns = FALSE)
And this is the results
1. The sotu
input look fine with variable sotu_text
as character
> str(sotu)
'data.frame': 236 obs. of 6 variables:
$ sotu_text : chr "Fellow-Citizens of the Senate and House of Representatives: \n\nI embrace with great satisfaction the opportuni"| __truncated__ "\n\n Fellow-Citizens of the Senate and House of Representatives: \n\nIn meeting you again I feel much satisfact"| __truncated__ "\n\n Fellow-Citizens of the Senate and House of Representatives: \n\n \"In vain may we expect peace with the In"| __truncated__ "Fellow-Citizens of the Senate and House of Representatives: \n\nIt is some abatement of the satisfaction with w"| __truncated__ ...
$ president : chr "George Washington" "George Washington" "George Washington" "George Washington" ...
$ year : int 1790 1790 1791 1792 1793 1794 1795 1796 1797 1798 ...
$ years_active: chr "1789-1793" "1789-1793" "1789-1793" "1789-1793" ...
$ party : chr "Nonpartisan" "Nonpartisan" "Nonpartisan" "Nonpartisan" ...
$ sotu_type : chr "speech" "speech" "speech" "speech" ...
2. However, it can't run the PrepText
function
Downloading udpipe model from https://raw.githubusercontent.com/jwijffels/udpipe.models.ud.2.4/master/inst/udpipe-ud-2.4-190531/english-ewt-ud-2.4-190531.udpipe to /Users/vuongdat/OneDrive - Aarhus universitet/Applied Data Science/Share Data Folder/2. Raw Data/Reddit Data/english-ewt-ud-2.4-190531.udpipe
Visit https://github.com/jwijffels/udpipe.models.ud.2.4 for model license details
trying URL 'https://raw.githubusercontent.com/jwijffels/udpipe.models.ud.2.4/master/inst/udpipe-ud-2.4-190531/english-ewt-ud-2.4-190531.udpipe'
Content type 'application/octet-stream' length 16477964 bytes (15.7 MB)
==================================================
downloaded 15.7 MB
Error in check_input(x) :
Input must be a character vector of any length or a list of character
vectors, each of which has a length of 1.
In addition: Warning message:
'unnest_tokens_' is deprecated.
Use 'unnest_tokens' instead.
See help("Deprecated")
Below is my session Information
R version 3.6.1 (2019-07-05)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Catalina 10.15.3
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] textnets_0.1.1 networkD3_0.4 ggraph_2.0.2 ggplot2_3.3.0 udpipe_0.8.3 dplyr_0.8.5
loaded via a namespace (and not attached):
[1] Rcpp_1.0.4 plyr_1.8.6 pillar_1.4.3 compiler_3.6.1 tokenizers_0.2.1 viridis_0.5.1
[7] tools_3.6.1 digest_0.6.25 viridisLite_0.3.0 lifecycle_0.2.0 tibble_3.0.1 gtable_0.3.0
[13] lattice_0.20-38 pkgconfig_2.0.3 rlang_0.4.5 Matrix_1.2-17 tidygraph_1.1.2 igraph_1.2.5
[19] rstudioapi_0.11 ggrepel_0.8.2 gridExtra_2.3 stringr_1.4.0 janeaustenr_0.1.5 withr_2.2.0
[25] generics_0.0.2 htmlwidgets_1.5.1 graphlayouts_0.7.0 vctrs_0.2.4 grid_3.6.1 tidyselect_1.0.0
[31] glue_1.4.0 data.table_1.12.8 R6_2.4.1 polyclip_1.10-0 reshape2_1.4.4 purrr_0.3.4
[37] tidyr_1.0.2 tweenr_1.0.1 farver_2.0.3 magrittr_1.5 SnowballC_0.7.0 htmltools_0.4.0
[43] scales_1.1.0 ellipsis_0.3.0 MASS_7.3-51.4 tidytext_0.2.4 assertthat_0.2.1 ggforce_0.3.1
[49] colorspace_1.4-1 stringi_1.4.6 munsell_0.5.0 crayon_1.3.4
yes, the textvar
for the sotu
example is type character.
You need to change the "sotu_text" variable name to "textvar" to make it work.
I had the same issue and it wasn't until I saw under the hood of the PrepText function how the input was evaluated in the unnesttokens function (which is deprecated, BTW).
Hi folks- thanks for pitching in to answer this question, I'm sorry I was not available I am busy planning for a major educational event this summer (the Summer Institutes in Computational Social Science: https://compsocialscience.github.io/summer-institute/). Ok to close this now @angelhsu05 ? Or is it still not working.
yes that was the trick, thanks @sotork! Perhaps worth updating the example code snippet below:
names(sotu_first_speeches)[1] <- "textvar"
prepped_sotu <- PrepText(sotu_first_speeches, groupvar = "president", textvar = "textvar", node_type = "groups", tokenizer = "words", pos = "nouns", remove_stop_words = TRUE, compound_nouns = TRUE)
@angelhsu05
perhaps avoid naming your text variable as text
in order not to coincide the same variable name text
from unnest_tokens
in tidytext
. worth to look through this
Hi all- I'm unable to reproduce this error. I want to make sure I am following the solution correctly-- did you wind up changing textvar="textvar" in order to make it work? Does that mean you renamed the variable in the sotu dataset?
Yes Chris...I renamed it. It was "sotu_text", it needs to be "textvar" to make the function work.
I was having the exact same issue as everyone else on this thread. Renaming sotu_text
to textvar
solved it for some mysterious reason, as others have discovered. It sounds like the underlying code needs to be updated to use 'unnest_tokens' (new) instead of 'unnesttokens' (deprecated).
This fails:
library(textnets)
data("sotu")
sotu_first_speeches <- sotu %>%
group_by(president) %>%
slice(1L)
prepped_sotu <- PrepText(sotu_first_speeches,
groupvar = "president",
textvar = "textvar",
node_type = "groups",
tokenizer = "words",
pos = "nouns",
remove_stop_words = TRUE,
compound_nouns = TRUE)
This functions:
library(textnets)
data("sotu")
sotu_first_speeches <- sotu %>%
group_by(president) %>%
slice(1L) %>%
ungroup() %>%
rename(textvar = sotu_text)
prepped_sotu <- PrepText(sotu_first_speeches,
groupvar = "president",
textvar = "textvar",
node_type = "groups",
tokenizer = "words",
pos = "nouns",
remove_stop_words = TRUE,
compound_nouns = TRUE)
Hi all: I just pushed a fix for this (it turned out to be an issue with variable indirection created by R 4.0). Could one or more of you please try a fresh install (and try rerunning the example code) to verify that you no longer need to change the textvar
column name as @kelseygonzalez did above? Thank you!
It runs without the textvar
column name now!
I’m getting an “Check_input” error when I try to run PrepText using the sotu example, even though the text is type character. I tried creating my own tf-idf data frame using tidy text so I could still use the visualization functions in this package but I wasn’t sure what the outputs of PrepText and CreateTextnet look like to troubleshoot. Thanks for your help!