WICG / handwriting-recognition

Handwriting Recognition Web API Proposal
https://wicg.github.io/handwriting-recognition/
Other
77 stars 17 forks source link

Consider an API with fewer mutable classes #27

Open domenic opened 3 years ago

domenic commented 3 years ago

Hey there,

From the perspective of idiomatic JavaScript, I was wondering why the API has so many classes and custom add/remove/clear methods, instead of using objects and arrays.

That is, instead of the example at https://github.com/WICG/handwriting-recognition/blob/main/explainer.md#perform-recognition, I would have expected something like this:

// Create a new stroke. It's a plain JS array.
const stroke = [];

// Add a point.
const point = { x: 84, y: 34, t: 959 };

// The point dictionary is added to the stroke array
stroke.push(point)

// Modifying a point added to a stroke does have an effect.
point.x = newX    // This does work.
stroke[0].x = newX    // This also works.

// The point's value has changed
stroke[0].x === 84    // => false

// Point's time attribute is optional.
stroke.push({ x: 93, y: 54 })

// Create a new drawing. It's a JS array of strokes.
const drawing = [stroke];

// Add more points to the stroke.
stroke.push({ x: 93, y: 54, t: 1013 });

// Get predictions of the partial drawing.
// This will take into account both points that were added to the stroke.
await handwriting.getPrediction(drawing);

// The returned value is the same as for the original drawing.getPrediction() API.

// Add a new stroke.
const stroke2 = []
stroke2.push({x: 160, y: 39, t: 1761});
drawing.push(stroke2);

// Get all strokes:
drawing
// => [stroke, stroke2]

// Delete a previous stroke.
drawing.splice(0, 1);

// Get a new prediction.
await handwriting.getPrediction(drawing)

// No need to free up resources, since it's just a JS array of objects, which the GC handles normally.

I'm not sure why the current API has so many mutable classes and extra copies. All the actual work seems to happen in the async drawing.getPrediction() (or, in my version, handwriting.getPrediction(drawing)). So transforming that data into HandwritingStroke and HandwritingDrawing classes seems like extra work.

Are there implementation reasons why these mutable classes are necessary? If so, it'd be good to be clear about them in the explainer, and especially to explain why they are necessary in all implementations and not just a Chromium implementation limitation. I found https://github.com/WICG/handwriting-recognition/blob/main/explainer.md#alternative-api-design which maaaybe explains why a HandwritingDrawing class is necessary (although I had to read a lot between the lines; I'm guessing the idea is that you have some sort of side table mapping earlier versions of the drawing to recognition results, and then reuse those partial results in future calls to getPrediction()? And that's impossible to do based on normal JS objects---for some reason?). But I'm not sure it explains why the HandwritingStroke class is necessary.

wacky6 commented 3 years ago

In the current design, HandwritingDrawing and HandwritingStroke makes it possible to track drawing / stroke changes, so the implementation can reuse existing results (for better efficiency).

The number of strokes and points in a drawing could be very large (thousands, or tens of thousands of points). Subsequent predictions may only involve a small change to the drawing.

For example:

  1. User wrote 3 paragraphs of text.
  2. The first getPrediction() is called, the implementation segment text into paragraphs, and stores each paragraph's prediction internally.
  3. User adds some new strokes to the 3rd paragraph
  4. The second getPrediction() is called, the implementation knows the strokes (for the first 2 paragraphs) aren't modified and reuse their prediction.

AFAIK, built-in JS arrays can't do this efficiently.

domenic commented 3 years ago

As I said, that makes sense as to why you might need a HandwritingDrawing class, since maybe it's easier to cache getPrediction() results on a HandwritingDrawing class than on a JS array. (But see below.) But it doesn't explain why you need a HandwritingStroke class...

AFAIK, built-in JS arrays can't do this efficiently.

I think you could use either the built-in JS array, or a HandwritingDrawing instance, as the cache key for your internal store. If you used a built-in JS array, you'd have to either compute some sort of running hash of the array contents, or use the array object itself as the cache key and account for mutations of its contents during lookup. This is indeed more complicated, but maybe worth it in terms of simplifying the API for web developers.

wacky6 commented 3 years ago

Thanks for the suggestion. I agree arrays are simpler, but may not be a good choice for some use cases. Please let me show a more concrete example.

Say, the user writes cursive text "hello" in a single stroke. The web application requests a prediction midway (e.g. after "hel" are written). The code will look like this:

const drawing = new HandwritingDrawing()
const stroke = new HandwritingStroke()

// The stroke starts.
drawing.addStroke(stroke)

// Points for "hel"
stroke.addPoint({x, y, t})
drawing.getPrediction()    // => prediction is "hel"

// Points for "lo"
drawing.addPoint({x,y,t})
drawing.getPrediction()   // => prediction is "hello"

The implementation can easily determine stroke has been changed after being added to the drawing, by updating drawing's state in addPoint.

If we replace stroke with an array:

const drawing = new HandwritingDrawing()
const stroke = []

// The stroke starts.
drawing.addStroke(stroke)

// Points for "hel"
stroke.push({x, y, t})
drawing.getPrediction()

// Points for "lo"
stroke.push({x, y, t})
drawing.getPrediction()

// Or mutate the value of a point.
stroke[1].x = newX

It's very hard (or impossible?) to determine the changed strokes without inspecting all strokes:

domenic commented 3 years ago

The hashing cost isn't trivial (iterating through tens of thousands of points in the drawing) compared to the time to perform an incremental prediction (on thousands of points, could be as low as 10 ms).

When I benchmark iterating through a ten-thousand item vector<> in C++, I get numbers much lower than 10 ms. I am not a benchmarking expert though; perhaps you could show me how you are getting larger numbers for this sort of operation?

wacky6 commented 3 years ago

Discussed offline with @domenic:

The key is balance between API flexibility and easy-of-use.

Let's propose / plan a future addition to HandwritingStroke's constructor to accept Array<Point> (perhaps HandwritingDrawing as well), and convert them into addPoint calls internally.

With this addition, developers can avoid boilerplate-ish addPoint calls, and use array manipulations (which they are already familiar with) if suitable. This may not made into the first spec revision / release though.

HandwritingStroke enables easy implementation of incremental recognition (better performance if the drawing is large / contains lots of points), and future extension to per-stroke query methods (like the Recognized field in MS Ink API).