Closed markuscraig closed 7 years ago
Thanks, ill take a look. As a note the default timeout for the nodejs driver looks to be 15seconds, whereas gocql is 600ms
Changing the Gocql timeout helped, but still the same slower performance.
I'm wondering if the statements are not getting automatically prepared by Gocql for some reason?
When I disable prepared statements in the Node.js version, I see very similar slower write performance.
Statements in gocql are automatically prepared, however prepared-ness shouldn't matter in your case seeing as you aren't using any placeholders.
Can you test the performance using an obscene amount of workers rather than your default 8? Ex. use:
for job := range c.jobs {
//fmt.Printf("EXECUTE BATCH QUERY: entries = %d\n", job.count)
// execute the batch query
go func(batch *gocql.Batch) {
err := c.session.ExecuteBatch(batch)
if err != nil {
log.Printf("Couldn't execute batch: %s", err)
}
}(job.batch)
}
My hunch is the async nature of the node code may be executing more than 8 batches at once.
@markuscraig I looked through the nodejs code it is is probably just getting advantages by doing all the queries async as @nemothekid says.
Ok, thanks guys, much appreciated!
Are you sure your node script is running all the queries? You call "process.exit(0)" when the "final" batch completes, but nothing guarantees that the final batch isn't actually the first one to run. Do you see the message "Batch data update successful" printed out 5000 times? If not, you need to add code that waits for all the batches to invoke their callback before exiting the process.
I think you would want many more than 8 worker goroutines (e.g. 100 at least). There are other settings that could improve the go script's performance, like setting ProtoVersion to 3 if your c* version is new enough.
Hey Crew,
I'd like to use Go with Cassandra, but I'm getting much better write performance with the Node.js driver (almost 5x write performance with no timeouts). I feel like I must be doing something really wrong.
My test code below is simply updating counters millions of times using batch queries (using the same number of entries per batch in both Go and Node; 1000 entries), but I'm seeing dramatically different performance...
Here is the simple Cassandra schema I'm using for the test...
Just for reference, here is the Node.js code that writes 5 million updates in 15 secs. I never get any Cassandra timeouts when using Node.js, even if I add way too many entries per batch...
Here is the Go code that takes a little over 1 minute to complete. I am using goroutines and channels to hopefully send the batch data in parallel (8 workers). Also, I get timeouts if I set the number of workers > 8, so I feel like I have to manually tune the code when using the Go driver...
Anything jump out at you guys? Any back-pressure from Cassandra that I should be handling? Any thoughts or pointers appreciated...
Thanks! Mark