Closed pera closed 4 years ago
Can you please include your Elixir and OTP versions as well as a sample project with a CSV file that we can try directly? Thank you.
Ah sorry I forgot the most important: https://drive.google.com/file/d/1Hajl4sGbbdiMj2uTw9Ux3hyLng6cEow2/view?usp=sharing
Erlang/OTP 22 [erts-10.6.4] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1] [hipe]
Elixir 1.10.1 (compiled with Erlang/OTP 22)
Thank you!
...and here is everything as a project (the previous link was just the csv): https://drive.google.com/file/d/1QMvVb1pw_r7DiW08oQgNU_gMjMHNeOdX/view?usp=sharing
This is happening because you are keeping the whole reference to the large binary as you iterate. This means a lot of allocations in memory, which translates to more time spent doing GC, etc. For example, if you change Enum.map
to Enum.each
, you will see constant times. But if you change |> Enum.map(fn x -> Enum.at(x, 31) end)
to |> Enum.map(fn x -> :binary.copy(Enum.at(x, 31)) end),
so you no longer keep references to the large binary, then it is both much faster and consistent even in the Enum.map
case. We do talk about :binary.copy
in our docs, but perhaps we should make it clearer.
I have changed the docs to use copy by default.
Thanks a lot @josevalim, now I have an excuse to learn about Erlang's GC :slightly_smiling_face:
Hi! I'm experiencing an odd issue when parsing large CSV files (in the order of hundreds of megabytes): basically if I call f/0 repeatedly, each new call is slower than the previous one
In addition, the iex session also seems slower: if I call
recompile
it takes a couple of seconds instead of a few milliseconds to finish (and since this library is relatively simple it makes me wonder if this could actually be a problem with BEAM).