Closed thxmike closed 10 years ago
I need to update the docs to answer your questions, but until then: The default buffer size is 10. It can be read and written by:
var datapumps = require('datapumps')
console.log("Current default buffer size is: " + Buffer.defaultBufferSize()); // => 10
Buffer.defaultBufferSize(10000);
console.log("Current default buffer size is: " + Buffer.defaultBufferSize()); // = > 10000
You can also set the buffer size when creating it:
var datapumps = require('datapumps');
var comic_data_pump = new datapumps.Pump();
comic_data_pump
.mixin(datapump.mixin.RestMixin)
.from(comic_data_pump.createBuffer({
size: 10000 // you may set this to the page size of the REST service
}));
If you call the REST service multiple times, I'd do:
var handleRestResult = function(result) {
var finished = ... // determine if this was our last call or not.
// fill the buffer from result
if(finished) {
// seal when we don't want to read more
comic_data_pump.from().seal();
} else {
// read from REST service otherwise and fill the buffer when it becomes empty
comic_data_pump.get(...)
.then(function(result) {
if (comic_data_pump.from().isEmpty()) {
handleRestResult(result);
} else {
comic_data_pump.from().once('empty', function() {
handleRestResult(result);
});
}
});
});
}
};
comic_data_pump.get(...)
.then(handleRestResult);
This is not exactly what you asked for, but you'd need writeAsync to do better flow control. I'll write an example later if you still need it.
There are some events emitted by the datapumps.Buffer:
It is not allowed the write a Buffer when it is sealed. So you should seal the input buffer filled with the REST service when you don't want to write it again. All the other buffers will be sealed by the pumps, and the process will finish when all buffers emit the end event.
Buffer.write(): synchronous write of the buffer. It will throw error when the buffer is full or not writable (sealed). Buffer.writeAsync(): async write, it will return a Promise (see https://github.com/petkaantonov/bluebird) that fulfills when the data is written to the buffer.
Thanks for the information. If you do not mind, please go ahead and write an example of a writeAsync to do better flow control. Thanks again
I've started to write a .writeAsync
based example about accessing a paginated REST service, but I ended up with adding that code to RestMixin in 782274b33146dd7620a9c255d90c3864fa00347c
Using RestMixin to fill input pump is much easier now (at least imho):
pump
.mixin(RestMixin)
.fromRest({
query: function() {
return this.get('http://www.comicvine.com/api/volumes/', {
"multipart": false,
"query": {
"api_key": "xxxxxxxxx",
"format": "json",
"filter":"name:spider-man",
"field_list": "id,name,start_year,description,publisher,image"
}
});
},
resultMapping: function(volumes) { return volumes.results; }
.process(function(volume) {
console.log(volume);
})
See the updated docs for details: http://agmen-hu.github.io/node-datapumps/docs/mixin/RestMixin.html
.fromRest
will do the flow control internally and will also create the input buffer for you (no need for .from(pump.createBuffer()).
We have some questions and any information you provide would be helpful. What is the right way to handle running out of buffer space?
It is not clear to us whether we should manage the buffers or not especially when we exceed the amount of items in the buffer. Can you provide examples or information that can help us?
We would like to know the different between write and a writeAsync and how sealing the buffer plays into this?
We would like to know when we should seal or not seal?