Feature: Allow users to upload a plain text list of urls

flores commented 12 years ago

Sometimes copy pasta with a huge list is too hard

andrewvc commented 12 years ago

This is a great point. I think it's going to have to wait till 3.1 however. The full version of this involves a new API, which I'm calling the Dataset API.

Essentially, dealing with large amounts input data is something engulf can't yet do. In a number of places it's assumed that a 'job' includes all needed data, and that the size of it is relatively small. The full version of this involves uploading a large dataset in one step, then executing a job based on this data in a second step.

Presumably this will involve making datasets storable on the master, and retrievable as needed by workers on demand. This also means there will need to be a 'warmup' phase for workers, during which they may retrieve missing dataset files (workers should cache files, but not store them permanently).

I'm going to make this a requirement for a 3.1 release, and nix the RhinoJS as a 3.1 goal as I think supported large datasets is more useful, and will ultimately make JS support more useful as well.

flores commented 12 years ago

re: andrewvc commented 2 days ago I'm gonna revise the spec, instead of uploading a file you'll be available to specify the URL of a file which workers can download and run against.

Nice: Love it.

andrewvc / engulf

Feature: Allow users to upload a plain text list of urls #5