apa512 / clj-rethinkdb

Eclipse Public License 1.0
204 stars 42 forks source link

realize results to array #105

Closed funston closed 8 years ago

funston commented 8 years ago

This might be more of a question, so apologies, but I'm trying to use the drive to obtain a set of results.

Using the javascript driver, and the result is a array (ie, fully realized). Here, I get a cursor back, but I need to do some computation on all the values in the result set (about 800 items, small objects, about 5 fields each).

If I essentially do a into [], ie (let [result (into [] get_rethink_data)....it takes about 20 seconds to get all the data.

In javascript, all the data is returned in about 1 second, fully realized. Is there some way to achieve the same results directly in the clojure driver?

danielcompton commented 8 years ago

Hey @rschiavi can you share what queries you're running in JS, and in Clojure?

danielcompton commented 8 years ago

I think you're running into Thread/sleep delays with the Cursor. We're going to work around that with the new Core.async based interface. You can use that in the meantime, or coerce your query to an array on the RethinkDB side.

funston commented 8 years ago

@danielcompton thanks.

pretty basic testing setup/collection, a simple doc with an age and height field and then doing a between and filter ....with this clojure library how does one coerce rethink to return an array?

js:

r.db("test").table('people').between(20,30,   
   {index:"age"}).filter(r.and(r.row('height').gt(6),r.row('height').lt(7)))run(conn,function(err,cursor){
       cursor.toArray(function(err, result){
});

clj:

(-> (r/db "test")
    (r/table "people")
    (r/between 30 40  {:index "age" :left-bound :open :right-bound :open})
    (r/filter (r/fn [row]
                (r/and (r/ge (r/get-field row :height) 52) (r/le (r/get-field row :height) 62))))
    (r/run conn))
danielcompton commented 8 years ago

You should be able to use make-array. Keep in mind that this will realise all of your results on the RethinkDB server which will raise memory usage. This is probably fine if you've got a small data set (<20,000 records), but you don't want to rely on this long term.

The longer term fix is to use the core.async driver (still in a PR, but you could use it now), or the synchronous version which will be rewritten on top of the core.async driver and won't have a bunch of Thread/sleep's built in.

danielcompton commented 8 years ago

I'll close this for now, feel free to open another issue if you're still having problems.