Open MichaelChirico opened 4 years ago
I think this one can be closed...
The help page for read.csv references RFC 4180,
https://tools.ietf.org/html/rfc4180
which states that "Spaces are considered part of a field and should not be ignored."
In going from na1.csv to na2.csv, you didn't just trim the first column, but rather you also trimmed a leading space from second column, so as far as I can tell the behavior is exactly aligned with the documented specification.
Created attachment 1657 [details] tiny csv file 1
I attach a tiny .csv file, na1.csv. I created na2.csv by editing out the first column of na1.csv. (I can only attach one file, but I have pasted the contents below.)
na1.csv ==================== a, b, c 1, "b", 1 2, "", 2 , "b", 3 4, , 4 5, "NA", 5 ===========================
na2.csv =================== b, c "b", 1 "", 2 "b", 3 , 4 "NA", 5 ==========================
Here is what I get when I read them into dataframes:
a b c 1 1 b 1 2 2 2 3 NA b 3 4 4 4 5 5 NA 5
1 b 1 2 2 3 b 3 4 4 5 5
Error in Ops.factor(df1$b, df2$b) : level sets of factors are different
[1] " " " " " b" " NA"
[1] "" " " "b"
If I read them with as.is=TRUE, I again get the extra spaces in df1$b. Also, again, df1$b[5] is " NA" rather than NA.
I can't see why this would be "correct" behavior. I apologize if I've missed something here.
Thanks for your great work on R!
Best regards,
Joe Ritter
METADATA