Open rossjones opened 11 years ago
Really like this and this was, in fact, the original idea (arbitrary) transforms. How do we pass these in? My guess would be to allow use to point to a js file on the web (e.g. a gist) which contains the code to run.
So an api along the lines of
http://localhost:5000/csv/head/html/transform http://google.com/some.js/?url=http://static.london.gov.uk/gla/expenditure/docs/2012-13-P12-250.csv
Something like requiring the file to define a transform function via commonjs[1] module style which we will pass two arguments, the row, and the index.
Returning null will exclude the row from further transformations, and move on to the next row in the stream.
It would be even more awesome if I could post the script to an endpoint ( /install perhaps) with
{
"language": "javascript",
"name": "transform",
"code": "....."
}
so that I can then do
This'd let me 'install' code for re-use by myself and others.
Agree that having a function signature to be implemented like @davidmiller suggests would be good. Suspect it'll also need a cookie/user var to allow scripts to maintain state without globals (for instance to store the tail buffer). Perhaps something like ...
// Called before the first row is sent, expected to return some indication
// that it wants to continue (or perhaps might be skipped). Cookie provided
// here for state, will be passed to all other funcs. Should also be passed
// url args and then store them in the cookie.
function start(cookie, args) {}
// Called on each row
function transform( cookie, idx, row ) {}
// Called after all rows finished. In some cases (tail perhaps) this is where
// the actual data will come from, but would expect normally for result of
// transform to be the thing that is piped across to the next function.
function end(cookie) {}
It would be even more awesome if I could post the script to an endpoint
Oh, you mean scraperwiki 1.5 ? ;)
That increases complexity by an order of magnitude (have to manage namespaces/accounts/global function registry) while increasing utility a bit. e.g. it's a nicer API. (Which it totally is)
OTOH running from a URL in a sandbox becomes significantly easier to implement, and we can figure out if people really use the feature.
POSTing to a gist/pastebin/(your publish text on the internet service here) sounds like a simplest thing that could work halfway house to me
@rossjones I also thought about the install stuff ;-) Issue is we start having login and storage somewhere but not that difficult (I'd do my usual thing at the moment and do github login + storing in a gist).
However my concerns were similar to @davidmiller, namely increase in complexity compared to increase in benefit. Given KISS principle a first pass would be I think to not allow storing scripts - ie. its up to user to store them somewhere.
This seems pretty straightforward to implement and would be pretty awesome ;-)
a cookie/user var to allow scripts to maintain state without globals
So AMD gives you closured globals if/when you need 'em, but there is some manual taking care you'd have to do...
Passing around (and us keeping track of) each function's state/scope object (hereafter known as "The Angular.js Pattern") is, you know, a bit of a faff, with the only real benefit being the dependency injection benefits for your unit tests.
And we all expect that unit tests are going to be ubiquitous for this kind of thing rite? ;)
One alternative recipe would be to require the exported transform to be an object containing the methods
(and we force the scope of this
for them ) (hereafter known as "The Backbone.js Pattern")
The User then gets to do their own state management in a constructor, and I no longer have to care/know about it/can't interfere :)
Other patterns are available :) Although that'd be my preference right now
I keep forgetting most JS stuff doesn't need to be re-entrant.
Maybe worth just having a 'gist' op that takes the ID as a parameter?
@rossjones huge +1 for the simple gist op with id as parameter (super clean ...).
@andylolz this might be the most fun thing to implement and its super cool ;-)
V cool indeed! Will make a start on this one next. soon!
(was: Provide JS sandbox for user-specified filter functions)
It would be great if users could provide a filter function to be executed on each row.
This would be more powerful than grep as it could take into account values in other cells. And something similar could also map a new column onto the table using a user-specific filter (for example).
Something like http://gf3.github.io/sandbox/ looks like a reasonably good solution for JS. This particular one would be inproc, but can imagine other languages being allowed to run code over the rows in a different type of sandbox.