Closed wgmmaas closed 7 months ago
I am not familiar with the warning, but I suspect it is from the readr
package and probably related to data types.
See: https://github.com/tidyverse/readr/issues/1477
Or potentially dplyr when the disaggregated data frames are being stacked.
I suspect it's harmless - for example integers and doubles mixing, which impacts representation in memory in R but would not change how the data would appear once written to a CSV file.
But please let me know if you discover otherwise.
Thanks Jesse, you are correct. It is a parsing problem in readr
. It is guessing the "LegalDomicileCountry" column type incorrectly (see below). As this does not affect the rest of my application, I will ignore it. Thanks.
URL <- paste0("https://nccs-efile.s3.us-east-1.amazonaws.com/index/data-commons-efile-index-", 2019, ".csv")
d <- readr::read_csv(URL, show_col_types = FALSE)
parsing_problems <- problems(d)
if (nrow(parsing_problems) > 0) {
print(parsing_problems)
}
> print(parsing_problems)
# A tibble: 181 x 5
row col expected actual file
<int> <int> <chr> <chr> <chr>
1 1819 13 1/0/T/F/TRUE/FALSE CA ""
2 3225 13 1/0/T/F/TRUE/FALSE NI ""
3 5076 13 1/0/T/F/TRUE/FALSE CA ""
4 5078 13 1/0/T/F/TRUE/FALSE CJ ""
5 5502 13 1/0/T/F/TRUE/FALSE CA ""
6 7666 13 1/0/T/F/TRUE/FALSE HO ""
7 8408 13 1/0/T/F/TRUE/FALSE CA ""
8 9305 13 1/0/T/F/TRUE/FALSE UK ""
9 14025 13 1/0/T/F/TRUE/FALSE AU ""
10 21681 13 1/0/T/F/TRUE/FALSE BD ""
# i 171 more rows
# i Use `print(n = ...)` to see more rows
Edit: I patched to the newest version that uses data.table
and I do not get the error anymore, thanks!
Ok, great. And yes, I updated the build_index() function so that all columns are loaded as strings (character vectors). Glad it worked!
Hi @lecy et al.,
Thanks for your work on this package. I get a warning message that I did not get before:
> index <- build_index(tax.years = 2019)
What could be the reason for the warning? And is it safe to ignore it, as I end up with the index of 523,999 observations (only two observations short of the 524,001 it should find for 2019 according to the README)? Thanks, Wim