Closed jeregrine closed 1 year ago
Yes, it makes sense. File.stream!
is line based (and afaik not configurable).
@josevalim Just an FYI if I am reading the File.stream and IO.binread docs correctly I can iterate by number of bytes too and I get the same error:
File.stream!("/tmp/mprop.csv", [], 100_000)
|> CSV.parse_stream()
|> Enum.take(1)
** (NimbleCSV.ParseError) unexpected escape character " in "MAP_EXT\"\r\"0000005005\",\"\",\"2022\",\"\",\"40\",\"\",\"2263\",\"2263\",\"\",\"N\",\"LAKE\",\"DR\",\"\",\"CCA\",\"0\",\"0\",\"0\",\"\",\"0\",\"0\",\"0\",\"\",\"CCA\",\"0\",\"0\",\"0\",\"\",\"0\",\"0\",\"0\",\"\",\"\",\"\",\"\",\"0.00\",\"\",\"\",\"\",\"\",\"\",\"\",\"\",\"\",\"000000000\",\"\",\"\",\"\",\"\",\"\",\"\",\"\",\"N\",\"N\",\"0\",\"0.00000\",\"0\",\"\",\"0\",\"0\",\"0\",\"0\",\"0\",\"0\",\"\",\"\",\"\",\"\",\"RM4\",\"0\",\"0\",\"\",\"187000\",\"1000\",\"53202\",\"3\",\"1\",\"1\",\"\",\"C2\",\"\",\"1\",\"\",\"\",\"\",\"\",\"\",\"82932\",\"99999\",\"XXXX\",\"9\",\"\",\"\"\r\"0000005010\",\"\",\"2022\",\"\",\"40\",\"\",\"9164\",\"9164\",\"\",\"N\",\"70TH\",\"ST\",\"\",\"CCA\",\"0\",\"0\",\"0\",\"\",\"0\",\"0\",\"0\",\"\",\"CCA\",\"0\",\"0\",\"0\",\"\",\"0\",\"0\",\"0\",\"\",\"\",\"\",\"\",\"0.00\",\"\",\"\",\"\",\"\",\"\",\"\",\"\",\"\",\"000000000\",\"\",\"\",\"\",\"\",\"\",\"\",\"\",\"N\",\"N\",\"0\",\"0.00000\",\"0\",\"\",\"0\",\"0\",\"0\",\"0\",\"0\",\"0\",\"\",\"\",\"\",\"\",\"PD\",\"0\",\"0\",\"\",\"101\",\"3000\",\"53223\",\"9\",\"4\",\"6\",\"\",\"N1\",\"\",\"0\",\"\",\"\",\"\",\"\",\"\",\"121602\",\"99999\",\"XXXX\",\"9\",\"\",\"\"\r\"0000005020\",\"\",\"2022\",\"\",\"40\",\"\",\"8919\",\"8919\",\"\",\"N\",\"70TH\",\"ST\",\"\",\"CCA\",\"0\",\"0\",\"0\",\"\",\"0\",\"0\",\"0\",\"\",\"CCA\",\"0\",\"0\",\"0\",\"\",\"0\",\"0\",\"0\",\"\",\"\",\"\",\"\",\"0.00\",\"\",\"\",\"\",\"\",\"\",\"\",\"\",\"\",\"000000000\",\"\",\"\",\"\",\"\",\"\",\"\",\"\",\"N\",\"N\",\"0\",\"0.00000\",\"0\",\"\",\"0\",\"0\",\"0\",\"0\",\"0\",\"0\",\"\",\"\",\"\",\"\",\"PD\",\"0\",\"0\",\"\",\"101\",\"3001\",\"53223\",\"9\",\"4\",\"6\",\"\",\"N1\",\"\",\"0\",\"\",\"\",\"\",\"\",\"\",\"91341\",\"99999\",\"XXXX\",\"9\",\"\",\"\"\r\"0000005030\",\"\",\"2022\",\"\",\"40\",\"\",\"9036\",\"9036\",\"\",\"N\",\"70TH\",\"ST\",\"\",\"CCA\",\"0\",\"0\",\"0\",\"\",\"0\",\"0\",\"0\",\"\",\"CCA\",\"0\",\"0\",\"0\",\"\",\"0\",\"0\",\"0\",\"\",\"\",\"\",\"\",\"0.00\",\"\",\"\",\"\",\"\",\"\",\"\",\"\",\"\",\"000000000\",\"\",\"\",\"\",\"\",\"\",\"\",\"\",\"N\",\"N\",\"0\",\"0.00000\",\"0\",\"\",\"0\",\"0\",\"0\",\"0\",\"0\",\"0\",\"\",\"\",\"\",\"\",\"PD\",\"0\",\"0\",\"\",\"101\",\"3000\",\"53223\",\"9\",\"4\",\"6\",\"\",\"N1\",\"\",\"0\",\"\",\"\",\"\",\"\",\"\",\"77221\",\"99999\",\"XXXX\",\"9\",\"\",\"\"\r\"0000005048\",\"\",\"2022\",\"\",\"40\",\"\",\"9083\",\"9083\",\"\",\"N\",\"85TH\",\"ST\",\"\",\"CCA\",\"0\",\"0\",\"0\",\"\",\"0\",\"0\",\"0\",\"\",\"CCA\",\"0\",\"0\",\"0\",\"\",\"0\",\"0\",\"0\",\"\",\"\",\"\",\"\",\"0.00\",\"\",\"\",\"\",\"\",\"\",\"\",\"\",\"\",\"000000000\",\"\",\"\",\"\",\"\",\"\",\"\",\"\",\"N\",\"N\",\"0\",\"0.00000\",\"0\",\"\",\"0\",\"0\",\"0\",\"0\",\"0\",\"0\",\"\",\"\",\"\",\"\",\"RM1\",\"0\",\"0\",\"\",\"101\",\"2006\",\"53224\",\"9\",\"4\",\"6\",\"\",\"N1\",\"\",\"0\",\"\",\"\",\"\",\"\",\"\",\"108931\",\"99999\",\"XXXX\",\"9\",\"\",\"\"\r\"0000005049\",\"\",\"2022\",\"\",\"40\",\"\",\"8425\",\"8425\",\"\",\"W\",\"ALLYN\",\"CT\",\"\",\"CCA\",\"0\",\"0\",\"0\",\"\",\"0\",\"0\",\"0\",\"\",\"CCA\",\"0\",\"0\",\"0\",\"\",\"0\",\"0\",\"0\",\"\",\"\",\"\",\"\",\"0.00\",\"\",\"\",\"\",\"\",\"\",\"\",\"\",\"\",\"000000000\",\"\",\"\",\"\",\"\",\"\",\"\",\"\",\"N\",\"N\",\"0\",\"0.00000\",\"0\",\"\",\"0\",\"0\",\"0\",\"0\",\"0\",\"0\",\"\",\"\",\"\",\"\",\"RM1\",\"0\",\"0\",\"\",\"101\",\"2003\",\"53224\",\"9\",\"4\",\"6\",\"\",\"N1\",\"\",\"0\",\"\",\"\",\"\",\"\",\"\",\"45569\",\"99999\",\"XXXX\",\"9\",\"\",\"\"\r\"0000005050\",\"\",\"2022\",\"\",\"40\",\"\",\"9060\",\"9060\",\"\",\"N\",\"85TH\",\"ST\",\"\",\"CCA\",\"0\",\"0\",\"0\",\"\",\"0\",\"0\",\"0\",\"\",\"CCA\",\"0\",\"0\",\"0\",\"\",\"0\",\"0\",\"0\",\"\",\"\",\"\",\"\",\"0.00\",\"\",\"\",\"\",\"\",\"\",\"\",\"\",\"\",\"000000000\",\"\",\"\",\"\",\"\",\"\",\"\",\"\",\"N\",\"N\",\"0\",\"0.00000\",\"0\",\"\",\"0\",\"0\",\"0\",\"0\",\"0\",\"0\",\"\",\"\",\"\",\"\",\"RM1\",\"0\",\"0\",\"\",\"101\",\"2003\",\"53224\",\"9\",\"4\",\"6\",\"\",\"N1\",\"\",\"0\",\"\",\"\",\"\",\"\",\"\",\"115504\",\"99999\",\"XXXX\",\"9\",\"\",\"\"\r\"0000005056\",\"\",\"2022\",\"\",\"40\",\"\",\"8643\",\"8643\",\"\",\"W\",\"GREENBROOK\",\"DR\",\"\",\"CCA\",\"0\",\"0\",\"0\",\"\",\"0\",\"0\",\"0\",\"\",\"CCA\",\"0\",\"0\",\"0\",\"\",\"0\",\"0\",\"0\",\"\",\"\",\"\",\"\",\"0.00\",\"\",\"\",\"\",\"\",\"\",\"\",\"\",\"\",\"000000000\",\"\",\"\",\"\",\"\",\"\",\"\",\"\",\"N\",\"N\",\"0\",\"0.00000\",\"0\",\"\",\"0\",\"0\",\"0\",\"0\",\"0\",\"0\",\"\",\"\",\"\",\"\",\"RT1\",\"0\",\"0\",\"\",\"101\",\"2004\",\"53224\",\"9\",\"4\",\"6\",\"\",\"N1\",\"\",\"0\",\"\",\"\",\"\",\"\",\"\",\"16581\",\"99999\",\"XXXX\",\"9\",\"\",\"\"\r\"0000005060\",\"\",\"2022\",\"\",\"40\",\"\",\"9076\",\"9076\",\"\",\"N\",\"95TH\",\"ST\",\"\",\"CCA\",\"0\",\"0\",\"0\",\"\",\"0\",\"0\",\"0\",\"\",\"CCA\",\"0\",\"0\",\"0\",\"\",\"0\",\"0\",\"0\",\"\",\"\",\"\",\"\",\"0.00\",\"\",\"\",\"\",\"\",\"\",\"\",\"\",\"\",\"000000000\",\"\",\"\",\"\",\"\",\"\",\"\",\"\",\"N\",\"N\",\"0\",\"0.00000\",\"0\",\"\",\"0\",\"0\",\"0\",\"0\",\"0\",\"0\",\"\",\"\",\"\",\"\",\"PD\",\"0\",\"0\",\"\",\"201\",\"2000\",\"53224\",\"9\",\"4\",\"6\",\"\",\"N1\",\"\",\"0\",\"\",\"\",\"\",\"\",\"\",\"15449\",\"99999\",\"XXXX\",\"9\",\"\",\"\"\r\"0000005065\",\"\",\"2022\",\"\",\"40\",\"\",\"10880\",\"10880\",\"\",\"W\",\"DONNA\",\"DR\",\"\",\"CCA\",\"0\",\"0\",\"0\",\"\",\"0\",\"0\",\"0\",\"\",\"CCA\",\"0\",\"0\",\"0\",\"\",\"0\",\"0\",\"0\",\"\",\"\",\"\",\"\",\"0.00\",\"\",\"\",\"\",\"\",\"\",\"\",\"\",\"\",\"000000000\",\"\",\"\",\"\",\"\",\"\",\"\",\"\",\"N\",\"N\",\"0\",\"0.00000\",\"0\",\"\",\"0\",\"0\",\"0\",\"0\",\"0\",\"0\",\"\",\"\",\"\",\"\",\"PD\",\"0\",\"0\",\"\",\"201\",\"1010\",\"53224\",\"9\",\"4\",\"6\",\"\",\"N1\",\"\",\"0\",\"\",\"\",\"\",\"\",\"\",\"40657\",\"99999\",\"XXXX\",\"9\",\"\",\"\"\r\"0000005068\",\"\",\"2022\",\"\",\"40\",\"\",\"8674\",\"8674\",\"\",\"N\",\"SERVITE\",\"DR\",\"\",\"CCA\",\"0\",\"0\",\"0\",\"\",\"0\",\"0\",\"0\",\"\",\"CCA\",\"0\",\"0\",\"0\",\"\",\"0\",\"0\",\"0\",\"\",\"" <> ...
/Users/jasonstiebs/Library/Caches/mix/installs/elixir-1.14.3-erts-13.2/9c525472ea870d261cb65ad0319c4a4d/deps/nimble_csv/lib/nimble_csv.ex:583: CSV.escape/6
/Users/jasonstiebs/Library/Caches/mix/installs/elixir-1.14.3-erts-13.2/9c525472ea870d261cb65ad0319c4a4d/deps/nimble_csv/lib/nimble_csv.ex:453: anonymous fn/4 in CSV.parse_stream/2
(elixir 1.14.3) lib/stream.ex:989: Stream.do_transform_user/6
(elixir 1.14.3) lib/enum.ex:3448: Enum.take/2
Yes, but the stream in NimbleCSV expects to receive lines separate by newlines. I don't think we can stream by "\r".
Sounds good, thanks for your patience with me. <3
I have found a strange CSV with
\r
as the newline separator in the wild that fails to parse when usingparse_stream
, but seems to work withparse_string
.I attempted a fix but I had some trouble following the logic/macros.
(aside) While trying to narrow down a smaller csv test case I struggled. I don't know if I'm dumb at using Sed but every time I did a
sed -i 's/\r/\n/g' /tmp/mprop.csv
thenhead /tmp/mprop.csv > /tmp/mprop-smol.csv
thensed -i 's/\n/\r/g' /tmp/mprop-smol.csv
it would produce different file line endings than the original. Same with with elixir and String.replace. I am probably very tired and doing something wrong and dumb or maybe its a special secret \r that I cannot reproduce. (/aside)Anyways here is the test case, the data is public and ~90mb csv so this should give you a case to work with.