Open standap opened 3 years ago
Oops, my apologies. It looks like I neglected to update the pkgdown pages with the vignettes when the rest of the package was bumped from 1.0 to 2.0. Let me see what I can do about that.
OK, the website is updated to 2.0 with an additional vignette that shows the use of quanteda functions on Hathi wordcounts. But it looks like there was also a missing merge from the dev branch fixing a conflict between the old id name ("id") and the new one("htid"). So if you reinstall from github, it should work now.
Thank you for looking into this. That was super fast.
I have reinstalled the package, but I am still not getting the dataframe.
The json files are downloaded into the local directory as expected, but the "unknown or uninitialised column: htid
." warning persists.
Hmm, weird. What kind of system is this? Those feather files should not be zero bytes, you're right to flag it. Maybe try:
Desktop/hathiTrust_intro/hathi-features/
, restarting R, trying again;gibbon_books = hathi_counts(gibbon, cols = c("page", "token"), cache=FALSE) %>% inner_join(gibbon_vols)
which should run but substantially slower than the feather caching.Thanks, Ben for your quick response and the pointers. I am on Ubuntu 21.04; R version 4.0.4 (2021-02-15); RStudio Version 1.4.1106
It seems that it can be all linked to the arrow package. After I run the arrow::arrow_info() function, all the compression methods were to set to FALSE, so I reinstalled the package with install_arrow(binary = FALSE, minimal = FALSE), following https://stackoverflow.com/questions/63096059/how-to-get-the-arrow-package-for-r-with-lz4-support. Once the I reinstalled the arrow package, everything works and the feather have non-zero sizes.
├── [ 3059234 Jun 8 09:19] nyp.33433081597191.feather
├── [ 236415 Jun 8 09:19] nyp.33433081597191.json.bz2
├── [ 2409754 Jun 8 09:19] nyp.33433081597290.feather
└── [ 193192 Jun 8 09:19] nyp.33433081597290.json.bz2
Hello Ben, I followed your vignette at https://humanitiesdataanalysis.github.io/hathidy/articles/Hathidy.html, but when I tried to pull the counts for all the Gibbon's books with
gibbon_books = hathi_counts(gibbon, cols = c("page", "token")) %>% inner_join(gibbon_vols)
I got errorI was able to work with your script on an individual item, "nyp.33433081597290" but not on the whole set.