Closed JBGruber closed 5 years ago
I moved this here since it seems to be a bigger issue. I'm not sure what's happening. Can you post your sessionInfo()
? I'm thinking about it in the meantime.
I assume packageDate("LexisNexisTools")
returns "2019-07-30"
? If so, can you try:
library(LexisNexisTools)
data <- lnt_read("~/Files(10).DOCX", author_keyword = "^Byline:", verbose = FALSE)
data@meta$Author
Yes, package date is the same. I will try the code later.
packageDate("LexisNexisTools") [1] "2019-07-30"
sessionInfo() R version 3.5.3 (2019-03-11) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS Mojave 10.14.6
Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] LexisNexisTools_0.2.3.9000
loaded via a namespace (and not attached):
[1] Rcpp_1.0.2 quanteda_1.5.1 pillar_1.4.2 compiler_3.5.3
[5] prettyunits_1.0.2 remotes_2.1.0 tools_3.5.3 stopwords_1.0
[9] pkgbuild_1.0.4 lubridate_1.7.4 tibble_2.1.3 gtable_0.3.0
[13] lattice_0.20-38 pkgconfig_2.0.2 rlang_0.4.0 Matrix_1.2-15
[17] fastmatch_1.1-0 cli_1.1.0 rstudioapi_0.10 curl_3.3
[21] parallel_3.5.3 xml2_1.2.2 withr_2.1.2 dplyr_0.8.0.1
[25] stringr_1.4.0 rprojroot_1.3-2 grid_3.5.3 tidyselect_0.2.5
[29] glue_1.3.1 data.table_1.12.2 R6_2.4.0 processx_3.3.0
[33] pbapply_1.4-1 callr_3.2.0 ggplot2_3.2.1 purrr_0.3.2
[37] spacyr_1.2 magrittr_1.5 backports_1.1.4 ps_1.3.0
[41] scales_1.0.0 stringdist_0.9.5.2 assertthat_0.2.1 colorspace_1.4-1
[45] striprtf_0.5.2 stringi_1.4.3 lazyeval_0.2.2 RcppParallel_4.4.3
[49] munsell_0.5.0 crayon_1.3.4
Here's what was returned after running the code.
library(LexisNexisTools)
data <- lnt_read("~/Files(10).DOCX", author_keyword = "^Byline:", verbose = FALSE) Reading DOCX files from Nexis Uni is experimental. Please report any problems in this issue: https://github.com/JBGruber/LexisNexisTools/issues/7
data@meta$Author [1] " Christopher Flavelle Highlight: The world's land is being exploited at an “unprecedented” rate, a United Nations report on climate change warns, putting pressure on food production and amplifying the risk of mass migration."
[2] " Rod Schoonover Highlight: Politics intruded on science and intelligence. That’s why I quit my job as an analyst for the State Department."
[3] " By NATHANIEL RICH Nathaniel Rich is a writer at large for The New York Times Magazine, for which he has written about immortal jellyfish, a 47-hour train ride between New Orleans and Los Angeles and a lawyer's campaign to expose DuPont's profligate use of a toxic chemical. He is the author of three novels, including ''King Zeno,'' which was published in January. George Steinmetz is a photographer who specializes in aerial imagery. He has won numerous awards including three prizes from World Press Photo and the Environmental Vision Award for his work on large-scale agriculture. He has published four books of photography, including his latest, ''New York Air: The View From Above.'' With additional reporting by Jaime Lowe, who is a frequent contributor to the magazine and the author of ''Mental: Lithium, Love and Losing My Mind.'' She previously wrote a feature about the incarcerated women who fight California wildfires." [4] " By ALAN SANO Body"
[5] NA
[6] " By KENDRA PIERRE-LOUIS Body"
[7] " THE LEARNING NETWORK Highlight: A special Earth Day guest lesson, written with NASA’s Goddard Institute for Space Studies, a leader in global climate change research, and the Columbia University Earth Institute. It offers resources for teaching about this issue, while addressing important 21st-century literacy skills."
[8] " Kendra Pierre-Louis Highlight: The average number of heat waves in 50 major American cities has tripled since the 1960s."
[9] " Kendra Pierre-Louis Highlight: The average number of heat waves in 50 major American cities has tripled since the 1960s."
[10] " Henry Fountain Highlight: The four-day hot spell was rare for France and the Netherlands, researchers say, but it used to be a lot rarer."
Thanks. It looks to me like the file was read in with a slightly different encoding on your machine. I have no idea why that might happen tbh. I changed the relevant code and made it explicit that UTF-8 should be used while reading the file in. Please install the newest version and let me know if the behaviour changes.
Did you have the chance to test this with the new version (packageDate("LexisNexisTools")
"2019-08-16"
)?
hum...I am getting all NAs on Author. Not sure why.
Originally posted by @tommyxie in https://github.com/JBGruber/LexisNexisTools/issues/7#issuecomment-520876120