Closed flores closed 10 years ago
This is a great point. I think it's going to have to wait till 3.1 however. The full version of this involves a new API, which I'm calling the Dataset API.
Essentially, dealing with large amounts input data is something engulf can't yet do. In a number of places it's assumed that a 'job' includes all needed data, and that the size of it is relatively small. The full version of this involves uploading a large dataset in one step, then executing a job based on this data in a second step.
Presumably this will involve making datasets storable on the master, and retrievable as needed by workers on demand. This also means there will need to be a 'warmup' phase for workers, during which they may retrieve missing dataset files (workers should cache files, but not store them permanently).
I'm going to make this a requirement for a 3.1 release, and nix the RhinoJS as a 3.1 goal as I think supported large datasets is more useful, and will ultimately make JS support more useful as well.
re: andrewvc commented 2 days ago I'm gonna revise the spec, instead of uploading a file you'll be available to specify the URL of a file which workers can download and run against.
Nice: Love it.
Sometimes copy pasta with a huge list is too hard