Closed ghost closed 7 years ago
@bwmcadams was supposed to be bringing MongoDB support for Slick using Hammersmith. Unfortunately, Hammersmith is still a bit too young to use in production. Wrapping the MongoDB calls in Future
or pushing to an actor, both on a different Dispatcher
only shifts the problem to another place.
I think it's time to see how Hammersmith would fit into Scalad, though.
scalad's API is non-blocking, in the sense that it will return immediately and you can perform foreach
or map
, etc that will only be evaluated when the element is available: the response type is Iterator
, and the underlying producer/consumer can be configured to use various blocking/caching strategies (to avoid OOMs). Iterating through the iterator may block, depending on how fast you're processing the results.
There is no such thing as non-blocking I/O. There is always blocking at some point. Claims of non-blocking I/O in any library are just smoke and mirrors.
What non-blocking middleware really means is "an API that allows the end user to define callbacks that will be called when the data is available": i.e. asynchronous. If you want that, then just do your processing in a foreach
or map
and like Jan says, shove anything else into a Future
... that's pretty much what every single "non-blocking" library does anyway, and I have no time for them: give me good old paged results any day, so I can control the memory usage.
btw, @janm399 check out https://groups.google.com/d/msg/scala-user/2wprKWyHAUo/3n5vInjVadAJ easily converted to a ParIterator for doing parallel processing of responses as they come in from scalad.
"async I/O" is exactly the point I am making: calling it "non-blocking" is a complete misnomer because "non-blocking" really shouldn't block but "async I/O" will always block at some layer. ScalaD is partially asynchronous in this regard:
mongo.find(query).map{_.thing}
will return instantly and the map
will be called asyncronously (but one at a time). Only when you try to obtain all the results will you experience blocking in your code, e.g. with a .reduce
or .toList
. You can use my code (above) to get a parallel iterator which will run map
in parallel, just like a List.par
. (NOTE: I am not sure about foreach
, it might block... depends on the Scala implementation)
Like Jan says, if you want a purely async API, then just do
Future {actor ! mongo.find(query).toList}
Back to "non-blocking": so you want libuv to block for the network results to arrive, using up a pthread
instead of a Java Thread
? That requires JNI since this isn't provided by the JVM, and requiring OS-specific natives, since TCP/IP is not part of the C or C++ spec (although, pthread
s now are part of C++), which will incur rather a lot of data array copying unless you want to use PrimitiveArrayCritical
(and I'm not sure it works in that direction).
I'd love to see performance comparisons: that's all that matters. Beyond that, it's pure coding style. If there is a performance advantage, I'm very interested, but I'd like to see the experiment and run it myself. In order to justify JNI, the performance advantage needs to be incredible.
The other fundamental flaw in Async I/O is that it throws data at you: best way to get an OOM. You need a pull based data source (like MongoDB, or paged JdbcTemplate) in order to avoid that problem, or be damn sure that you never ask for more than a few rows.
(you hit a nerve :-P... I'm sick of the "non-blocking" I/O hype)
hmm, I appear to stand corrected! :-) Although I don't understand how the response time for a SAFE INSERT can be consistently shorter than the time it takes for a PING: 50 micros vs 250 micros.
@janm399 it would appear that MAD is not so mad after all
@partycoder this ping thing is sticking with me as concerning... something stinks with the test.
Also, prompted by your recommendation to look into Java 7 NIO Async further (which I totally missed, btw, so thanks for pointing it out!), I'm not seeing any great performance benefits on the scale you're talking about. This is the highest hitting google search on the subject: http://vanillajava.blogspot.co.uk/2011/08/comparing-java-7-async-nio-with-nio.html
And IBM also point out that magic operating system support for "non-blocking IO" is not always used: http://www.ibm.com/developerworks/java/library/j-nio2-1/
"Each asynchronous channel constructed belongs to a channel group that shares a pool of Java threads, which are used for handling the completion of initiated asynchronous I/O operations. This might sound like a bit of a cheat, because you could implement most of the asynchronous functionality yourself in Java threads to get the same behaviour, and you'd hope that NIO.2 could be implemented purely using the operating system's asynchronous I/O capabilities for better performance. However, in some cases, it's necessary to use Java threads: for instance, the completion-handler methods are guaranteed to be executed on threads from the pool."
So, basically IBM are admitting that sometimes Thread
is used and effectively pthread
is used when it leaves the JVM. So there is still blocking somewhere :-P
Also, I've seen these sorts of things before: http://www.techempower.com/benchmarks/ but this is more a framework comparison than a "sync vs async" performance test. Ideally, one wants to test a clean PING without the framework (as the author does above).
The async hype does appear to still be mostly hype, with a little bit of promise, but if your performance charts are anything to go by: the MongoDB driver just really stinks! (we had our suspicions from the code quality of the BSON layer, if we're being honest).
@partycoder I know how to set the durability level, what is concerning is that the response time for a high durability level is much quicker than PING, so can these results really be believed?
@partycoder I think this should remain open as an RFE to swap to MAD.
But I still don't trust these perf tests: 1/5 the speed of PING? Something is wrong there: maybe he meant micro for ping, not milli.
I'll certainly have to invstigate further to understand the real benefits of the new Java 7 async IO. Thanks for bringing it up!
I had completely forgotten that this library existed so you're almost certainly better off going with whatever the latest and greatest is. scalad
was always just a wrapper layer anyway, not a driver replacement, and I've heard reactive mongo is very good. I never did believe those MAD numbers...
Not really a bug, but more a question.
Isn't the official Mongo driver blocking? Why not using an async API instead? (like MAD)
[updated title: fommil]