Open jessiejyen opened 7 years ago
if it is of any help - it seems like the first error stems from the stringdist
and the second one from the stringi
package so not really an issue from this package.
I see you tried to remedy your AccountsList1
but the problem might still be in the - AccountsList2
have you tried the same there?
Hi,
I am getting the same mistake, but I do not have any NA value in two tibbles I am using as inputs in the function. Did you solve it?
i can try and help to see what's going on, if someone could post a minimal reproducible example to work with.
I hit this error today,
i'm pretty sure it's some kind of out-dated dependent packages
I tried it on two platforms,
This works (did not hit error)
> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics
[3] grDevices utils
[5] datasets methods
[7] base
other attached packages:
[1] fuzzyjoin_0.1.3
loaded via a namespace (and not attached):
[1] compiler_3.4.1
[2] magrittr_1.5
[3] assertthat_0.2.0
[4] R6_2.2.2
[5] tools_3.4.1
[6] bindrcpp_0.2
[7] glue_1.1.1
[8] dplyr_0.7.4
[9] tibble_1.3.3
[10] yaml_2.1.16
[11] Rcpp_0.12.14
[12] pkgconfig_2.0.1
[13] rlang_0.1.4
[14] bindr_0.1
This does not work (hits error)
> sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS release 6.7 (Final)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] fuzzyjoin_0.1.4 here_0.1 dplyr_0.7.4 datzen_0.1.0
loaded via a namespace (and not attached):
[1] Rcpp_0.12.13 rprojroot_1.2 assertthat_0.2.0 R6_2.2.2 backports_1.1.1
[6] magrittr_1.5 rlang_0.1.2 bindrcpp_0.2 tools_3.2.3 glue_1.1.1
[11] rsconnect_0.8 pkgconfig_2.0.1 bindr_0.1 tibble_1.3.4
I am hitting the same error on large lists (small examples work fine):
Error in dists[include] <- stringdist::stringdist(v1[include], v2[include], : NAs are not allowed in subscripted assignments
[UPDATE]
Some exploration lead to a discovery of NAs in one of the arguments passed to stringdist_inner_join
. Can be located by which(is.na(v2))
, and replaced with df_cons[is.na(v2)] <- "empty_string"
.
After this, the error disappears.
@statsccpr I also faced a similar issue with one large dataset, and it works for me on Windows PC and didn't work on Mac OSX. I tried 2 different computers Linux and OSX with the exact same R and RStudio versions and package versions.
Error in dists[include] <- stringdist::stringdist(v1[include], v2[include], :
NAs are not allowed in subscripted assignments
Later I changed the encoding to "UTF-8" of the Excel file I was using and reloaded it in R Dataframe and it worked this time. Seems like OS encoding issue how Excel/CSVs are treated by Windows and OSX/Linux are different.
Hello! Please excuse any obvious or dumb errors - I am new to R and tried looking up the answers beforehand, I promise!
I am trying to join two tables using the R "fuzzyjoin" package. I am trying to fuzzy join two tables of company names, exactly. I have one data frame of 5000 company names, and one data frame of 1600 company names. There are other no columns besides the company names.
Using the package, I have: NewTable <- AccountsList1 %>% stringdist_inner_join(AccounttList2, by = NULL)
However, I got two errors: Joining by: "Accounts" Error in dists[include] <- stringdist::stringdist(v1[include], v2[include], : NAs are not allowed in subscripted assignments and 50: In stri_length(string) : invalid UTF-8 byte sequence detected. perhaps you should try calling stri_enc_toutf8()
So then I removed N/As via [!is.na(AccountsList1)] and forced UTF-8 via stri_enc_toutf8(AccountsList1, is_unknown_8bit = FALSE, validate = FALSE)
However, when I rerun I get the exact same errors... Does anyone have any advice? Thank you!