Closed movy closed 8 months ago
You could try CSV/TSV formats for selects and do your own row parsing after calling row.text
.
row.text
does not call JSON.parse
, it is for row.json()
only: https://github.com/ClickHouse/clickhouse-js/blob/main/packages/client-node/src/result_set.ts#L77-L81
I am also considering adding RowBinary
to supported formats, at least for selects.
Thanks for your advice. I wrote a simple datatype-aware parser to split TSV and convert "numbers" to numbers, and speed improved by x10 at least, amazing. Just a thought: as the structure of JSON is known based on columns types, maybe something like simdjson can be used?
@movy, I think RowBinary implementation could yield even more significant gains. See https://github.com/luizperes/simdjson_nodejs#benchmarks - simdjson is 1.5-2x faster than JSON.parse. It is probably worth adding as an optional switch, I will eventually look at what's possible here.
I am considering, however, adding RowBinary alongside the third "get data" method to the Row object: something like array()
, maybe? That will get all values of a particular row in the correct order, utilizing the metadata about the columns, which we will also receive. And the overhead should be pretty minimal.
Yes, array()
would be great as long as the type conversion is not done in JS.
In my case I receive an array of ~20 values via TabSeparated
and as most of them are expected to be numbers, hence I need to parse them. Parsing the whole array is even slower than JSON.parse()
, so I selectively parse 3-5 values needed for a particular event and the speed gains are very noticeable. Would be nice to avoid parsing altogether for the sake of code simplicity and for performance as well.
@movy, I am curious to run some internal tests on this (JSON vs. Text + "manual" numbers parsing vs. RowBinary + row as an array). Can you please contact me in the community Slack and share the code snippets and, maybe, a sample dataset (or its definitions) for your scenarios?
@slvrtrn @movy running into a similar scenario where JSON.parse isn't efficient enough for the result set I'm working with. Played around with the options described in this thread and have measured some improvement (particularly using simdjson instead of JSON.parse) but was curious to see how your testing is going/if there's an optimized strategy for these cases.
I wanted to post an update here, as this issue looks stale while it is not.
I've been studying RowBinary format and writing an experimental implementation of it with select streaming support out of the box for quite some time already; current progress is here: https://github.com/ClickHouse/clickhouse-js/compare/1.0.0...row_binary
At the moment of writing this, that branch supports decoding a minimal set of useful data types:
number
number
Currently in progress, as I consider the following types should be supported straight away (there are drafts decoders/placeholders for some already): Decimal, Enum, UUID, FixedString, DateTime(64), Array, Map, Tuple.
NB: under the hood, Date is just a UInt16 (number of days since Unix Epoch), and Date32 is just an Int32 (also days, but before/after). JS Date is chosen only as the default option for these types; there will be a mapping option to convert the "days" numeric value to any other object.
For example, consider the Date type. The mapper is just:
<T>(daysSinceEpoch: number) => T
And, by default, if no custom mapper is provided, it is something like this:
const defaultDateMapper = (daysSinceEpoch: number) => new Date(daysSinceEpoch * DayMillis)
Similarly, Date32, DateTime, DateTime64 (especially DateTime(9) as it requires proper nanoseconds support), and Decimal will have optional mappers support from their decoded "raw" representation.
@movy @chrisjh (and @cjk, I saw your reaction, too!), if you are still interested, I'd be curious to learn a few things:
RowBinary
in the client so that it could already be helpful in your scenario? Let's continue tracking this issue here: https://github.com/ClickHouse/clickhouse-js/issues/216; please don't hesitate to comment. Cheers.
@chrisjh, for your request, I created this issue to track: https://github.com/ClickHouse/clickhouse-js/issues/246
Hello, I continuously stream massive amounts of market data from ClickHouse and after profiling my app I discovered that most ticks are spent in stream parsing, i..e:
I briefly checked the module source code, and looks like quite expensive
JSON.parse()
is used internally even when we calltext()
.My question would be is there a faster way to parse incoming stream maybe by using some raw output format? Or am I using reading streams wrong?