Closed dcmoura closed 1 year ago
Using js, python or ruby?
Whatever would be more efficient :-) My question is, if you have to calculate an average from a large file (1GB-10GB JSON lines), what option would you recommend for best processing time, without getting out of memory?
Like fx data.json 'sum'
is ok.
Let's say we want the average of the overall
property.
With jq
I would do one of the following:
jq -n '[inputs.overall] | add/length' data.json
or
jq -n 'def sigma(s): reduce s as $x([0,0]; [.[0]+$x, .[1]+1]); sigma(inputs | .overall) | .[0] / .[1]' data.json
So it’s an array where each element contains overall field?
JSON lines, e.g.
{"overall": 2.0, "another": "bla"}
{"overall": 4.0, "another": "bla bla"}
{"overall": 1.0}
I see. This feature doesn’t yet ported for nodejs version to go version. Right now fx works only on single json.
OK, thank you
Actually I have created a PR to update the documentation concerning reducers. It describes how you can add your own functions/global data to the scope and namespace of the reducers.
You could then define a global variable to store your result, and then specify a reducer that updates that variable as it maps/each the json data. You could specify the reducer at the command line, or even have a specific function that is loaded using the .fxrc.js file that is available and can operate on it.
PR #203 contains the updated reducers.md and examples. I do not know when it will be committed so you can use the PR link to see the file if it hasn't been updated yet.
Actually fx now lacks support for aggregation.
I’m planning to add a support via jq style —slurp arg.
@dcmoura @antonmedv it lacks it in the sense of specific arguments or options that make fx aware of them.
It does support because the reducers in the node
language type because it's ... NodeJS ... which can operate on the data however the hell you want.
With a preloaded nodejs namespace/scope using .fxrc.js
with any modules/libraries/custom code or data.
The documentation update in PR #203 actually details using .fxrc.js
to provide more functionality to the command line reducer scripts as well as additional data sources that can be included in the namespace used at the command line.
See example usages
I’m also thinking of extending .fxrc.js to a js reducers as well.
Done!
What would be the most efficient way of calculating the average of a JSON property using fx? Thanks