Closed diasbruno closed 8 months ago
I'm moving this to "Ready for review" just so we can discuss if this is an ok idea.
Doc is pending and tests using streams.
I am not sure this is how I would go. My suggestion is for the parse_stream
function to return a NimbleCSV.Struct that implements the enumerable protocol, including the count function, and the count function itself is optimized. :)
"Rebranded" this PR to be more general than just count. But, counting can be a functionality baked in using this NimbleCSV.Struct
you are suggesting.
So, I'll leave this PR for the user state and open a new one for the count.
For this PR, I'll give to the state_transform
function the current parsed row, and, add the remaining tests and documentation.
Thanks, @josevalim.
Generally speaking, I don't want to expose the parser state or its transformation. I am concerned both about performance or the possibility we will expose too much of the internals, making it impossible to refactor later without breaking user code.
It doesn't introduces much of an overhead (and it must be optional).
It just enhances the state with a user state (a pair of {internal state, user state}
), but user is not allowed to modify the internal state.
This patch allows the package to stream each row, so the user can run aggregates and other operations as parser is stream lines. I can be an optimization is some cases.
I'll have a look at this patch this weekend, to finish the remaining things (docs, tests...).
The patch doesn't receives the parsed row, because it was written for the counter issue (#12). It must be generalized to perform not only counting, but any aggregating operation.
Hi, I understand the user is not allowed to modify the internal state, but the issue is, if we expose it, users are going to read it, and potentially modify it. And I’m not comfortable going down this route. If this is essential to your solution, then I believe it is not a good fit for this library, which on purpose was designed to have a small API surface
No problem, @josevalim. I understand your point.
I'll close this for now and run a fork of this repo, but I'll make this streaming line idea. Later on, if make sense, we merge libraries (don't wanna keep a separare library just for this).
Thanks for taking time to answer.
This PR allows user to define and control some state (different from the parse state) while parsing each line from the CSV.
Users can pass a state and a function that runs for each row.
New options are:
init_user_state
- the state that goes along with the parse statestate_transform_function
- function with arity 2state and parse state (:line or :header)
and it should return the next stateTo avoid breaking change, for the moment,
parse_string_with_state
was introduced to keep the previousparse_*
functions to run exactly as they were.