bchavez / RethinkDb.Driver

:headphones: A NoSQL C#/.NET RethinkDB database driver with 100% ReQL API coverage.
http://rethinkdb.com/api/java
Other
384 stars 134 forks source link

Investigate ReGrid Performance vs Node #96

Open bchavez opened 8 years ago

bchavez commented 8 years ago

Seems like Node ReGrid can get 3x more writes than .NET; yielding faster upload wall time. See image below (credits @buskila):

pasted_image_at_2016_08_24_13_03

Test setup

Upload only: File Size: 1 GB. Server: RethinkDB / Linux / Ubuntu 14, 3 nodes Client: .NET Core / Linux

Chunk Size: Default Batch Size: Default 8 -> 32

They tried single connection and connection pooling. No difference.

Using Stream IO:

// Upload a file using an IO stream
Guid uploadId;
using( var fileStream = File.Open("C:\\video.mp4", FileMode.Open) )
using( var uploadStream = bucket.OpenUploadStream("/video.mp4") )
{
    uploadId = uploadStream.FileInfo.Id;
    fileStream.CopyTo(uploadStream);
}

Suspicion

Too much chunk calculation in stream upload code. Try to parallelize / simplify some of this, especially when given byte[].

Node's ReGrid upload code is here: https://github.com/internalfx/regrid/blob/master/lib/upload.js

Other notes

This should come after #77 is done.

After some discussion with @interalfx (thanks a bunch), the upload code is using node streams. Node streams info via @buskila:

Using .pipe() has other benefits too, like handling backpressure automatically so that
node won't buffer chunks into memory needlessly when the remote client 
is on a really slow or high-latency connection.

https://github.com/substack/stream-handbook

Currently, @internalfx runs 10 network requests in flight at any given time. In a scenario where there is infinite network latency, node won't write to the ReGrid API until at least 1 network request completes.

Cool. I think we could maybe do the same with 10 async tasks laying down bytes over a connection pool then as they complete, then come back read more bytes as network requests complete.


Other Research Findings

RethinkDB Limitations