Closed engineerchange closed 4 years ago
Interesting. Any ideas what is going on or how to fix the issues?
I'm able to replicate with a fresh download from the MonetDB-R website (click on "VOC dataset" about halfway down the page).
My interpretation above is a bit wrong; read_tsv expects it to be logical by the 1000th line, but it is in fact a character, which causes the error.
If I do some guess_max changes, I can resolve this parsing error, and then other parsing errors surface, but row 2841 does not appear, as suggested in 5.3.1.
In fact, I don't see an error on row 2841 at all from the start.
@Robinlovelace are you able to replicate? I'm unclear on an immediate solution, but likely this portion of the section may need to get rewritten. Unclear how this behaviour would be different from years ago when running the same file.
Hmm. Interesting. Do you know what the intended classes of the data frames was? It's an excellent example of a tricky and large dataset to read-in and makes me wonder how other packages such as vroom
and data.table
would handle it. I have not had a chance to look, am also not sure how this worked years ago, but welcome any suggestions, it's a nice dataset for testing, that's for sure!
Many thanks for flagging this btw.
@Robinlovelace I too recently came across this issue when I was going through this really helpful book. So, I gave it a try rewriting this section with changes in #294. Please take a look.
In regards to intended classes for the voyages data frame that you raised above, I found this article that may deem interesting
Many thanks for the re-write @alwaysandeep, looks good to me! Awaiting feedback from @csgillespie on this.
Very elegant fix! 🔥
Running code in Section 5.3.1:
I get these unexpected warnings that differ from the "expected warning" on row 2841 in this section.
Looking at the voc_voyages.tsv file within the extdata directory of the efficient package, we can see there are some unexpected tab separators in these affected rows (e.g., 1023 and 1025); particularly, that there are 3 tabs in these rows (as opposed to 2) preceding the
bought
field, which throws the numeric columnbought
into the logical columnhired
: