Handle postfix overrun - Githubissues

didoesdigital / typey-type

Typey Type for Stenographers is a free typing app designed specifically to help steno students practise and rapidly master stenography.

https://didoesdigital.com/typey-type

GNU Affero General Public License v3.0

90 stars 15 forks source link

Handle postfix overrun #167

Closed na2hiro closed 4 months ago

na2hiro commented 5 months ago

This is to address the issue #15 with the batch update strategy that is described in https://github.com/didoesdigital/typey-type/issues/15#issuecomment-2058007673.

1st commit: add end-to-end test around app and simulate user typing whole state update. Confirm current behavior
2nd & 3rd commit: implement the batch update strategy under a flag and update the test case accordingly

I've confirmed following with manual and end-to-end tests:

A') fails "yours" for "you" in /lessons/stories/proverbs/proverbs-starting-with-y/
- (I reused the lesson for C. It should be the same thing as French vs Frenches)
B) fails "too bads" for "too bad" as 1 word in /lessons/collections/two-word-briefs-same-beginnings/too/
C) passes "You can" for "You" and "can" as 2 words typed quickly (in 1 stroke) in /lessons/stories/proverbs/proverbs-starting-with-y/

At this point, I'd like to get your feedback if this is going to a good direction or not. Also before removing the flag and current implementation, I'd like your input on stats to cover. The entire App state is huge, so I listed up some fields to test against in getStatsState() method, but I'm not sure that covers enough.

I'd appreciate if you can try this out. Thanks as always!

didoesdigital commented 5 months ago

@na2hiro, thanks for putting together this PR. I appreciate the added tests, incremental improvements, sharing a Work In Progress PR and checking in on direction, and the feature itself!

So far it looks like it's headed in a good direction. For the stats, I think testing currentLessonStrokes and all the totalNumberOf* variables covers what we need.

It looks like the test expectations need attempts data etc. so that they will pass, which I assume you're still working on.

Is the addition of @testing-library/user-event as an explicit dependency necessary when we import userEvent from Storybook's testing library? import { userEvent } from "@storybook/testing-library";

Can you please split up your commits? It would be useful to have:

A) package.json and yarn.lock dependency changes in a separate commit to code changes so that it's easy to address any potential merge conflicts or to drop or change dependency related commits B) Formatting commits separate from code commits (e.g. the changes to imports in src/App.js)

As a side note about App state being huge, I've written up an issue about that, which I would appreciate your input on: https://github.com/didoesdigital/typey-type/issues/168

na2hiro commented 5 months ago

Thanks for checking!

It looks like the test expectations need attempts data etc. so that they will pass, which I assume you're still working on.

You're right, for the new behavior, attempts currently lacks when corrected with {backspace} but it should have something (1 element?). It was mostly empty without App change so I didn't pay much attention, but now I understood it was like that because current implementation puts those attempts to the next phrase due to the overrun. Maybe I should include extra couple inputs to complete the next phrase to verify that.

Is the addition of @testing-library/user-event as an explicit dependency necessary when we import userEvent from Storybook's testing library? import { userEvent } from "@storybook/testing-library";

I didn't notice I was using another one. It seems @storybook/testing-library just re-exports @testing-library/user-event. I'm going to revert the module install then. (Addresses commit split A)

B) Formatting commits separate from code commits (e.g. the changes to imports in src/App.js)

I tried not to format existing code but I missed my editor updated it in imports. I hope linter can be used widely so anyone can run without it! Going to try to minimize diff soon when #168 or other refactoring is done! (I have some idea on it but will focus on this first)

na2hiro commented 5 months ago

Updated:

4th commit: Added logic to keep attempts correctly. Also updated test cases with failures to type out the next phrase.
Amended commits
- 1st commit: Reverted module install
- 2nd commit: Reverted import formatting in App.js

Could you double check if attempts look correct?

One thing I'm not sure about attempts is whether we want to consider "overrun + backspaces" as misstroke (isPeak) or not, because that pattern didn't exist before with char-by-char matching. For example,

Is input "Yours{bs}{bs}" for "You" correct?
Is input "Less{bs}" for "Les" correct?
- This input looks more valid because this anyway requires two strokes (HRE/-S)

In the current implementation, finishing with backspace is considered correct and won't appear in attempts.

didoesdigital commented 5 months ago

@na2hiro, regarding this comment

I think the input "Yours{bs}{bs}" for "You" should ideally be marked as incorrect because it expects to be 1 stroke and surely has been written in 2 strokes. I think both the input "Les" via LE/-S for "Les" and the input "Less{bs}" for "Les" should ideally be marked as correct because it expects to be 2 strokes and they both took 2 strokes to write.

For the attempts, I think we want to push more detail than we currently are on this branch. Previously, "sigh", "silent" via SAOEU/HREPBT would be marked correct and show "sigh" on the Finished screen as an interim stroke and if you wrote SAOEU/HREPT/*/HREPBT to output "sigh", "sigh leapt", "sigh", "silent", it would also be marked as correct and would show "sigh leapt" on the Finished screen. As a student I would consider the first case more correct than the second case and want to see that I wrote it differently. Now on this branch I am only seeing "silent" on the Finished screen without the interim strokes.

Interim word, "sigh":

Interim word, "sigh leapt":

No interim words, only finished "silent":

I am seeing some flakey test runs. Sometimes yarn test gives me all passing tests (854), and sometimes it gives me 1, 2, or 3 failed tests.

All passing:

"accepts first word but detect misstroke in second word when input at once" sometimes fails:

accepts first word but detect misstroke in second word when input at once fails

"doesn't accept excess chars" sometimes fails:

"accepts inputs at once" sometimes fails:

Example of all 3 failing:

na2hiro commented 5 months ago

Thanks for the deeper look!

For the attempts, I think we want to push more detail than we currently are on this branch. Previously, "sigh", "silent" via SAOEU/HREPBT would be marked correct and show "sigh" on the Finished screen as an interim stroke and if you wrote SAOEU/HREPT/*/HREPBT to output "sigh", "sigh leapt", "sigh", "silent", it would also be marked as correct and would show "sigh leapt" on the Finished screen. As a student I would consider the first case more correct than the second case and want to see that I wrote it differently. Now on this branch I am only seeing "silent" on the Finished screen without the interim strokes.

I tackled this one first. I added a test case to check the current behavior 7ae68fd, then updated the new implementation so it also passes the same test case 6c02eaa. There I switched to use currentPhraseAttempts which incorporates the latest stroke instead of what was right before. I didn't really understand why it was like that by reading code and the comment

// NOTE: here is where attempts are defined before being pushed with completed phrases

, but it seems it's natural for the new implementation. (Note: due to switching from char-to-char processing to batch processing, it stopped getting called with "sigh", "sig", "si", "sil", "sile", "silen") Let me know if I'm missing a big issue here.

Also I checked accuracy and attempts for all test cases and added TODO(na2hiro) comment for questionable cases. In 4380b8d I fixed some false positives for misstrokes.

I think the input "Yours{bs}{bs}" for "You" should ideally be marked as incorrect because it expects to be 1 stroke and surely has been written in 2 strokes. I think both the input "Les" via LE/-S for "Les" and the input "Less{bs}" for "Les" should ideally be marked as correct because it expects to be 2 strokes and they both took 2 strokes to write.

At last I made it work like that 0e721cb. At this point "yours" was already included in attempts but not flagged inaccurate because this is technically a 2-stroke word: "KPA/U". I created a heuristic to get observable target stroke counts to calculate target for "KPA/U" as one and make it fail, while "less{bs}" for "LE/-S" is considered accurate.

I am seeing some flakey test runs. Sometimes yarn test gives me all passing tests (854), and sometimes it gives me 1, 2, or 3 failed tests.

I couldn't reproduce this so far. I wonder if this still persists with the latest code. Maybe we need some waits between inputs to stably reproduce intended input batches.

didoesdigital commented 5 months ago

@na2hiro nice work! The behaviour for showing interim strokes in the chart is looking good now. The improvement to get target observable stroke count is really nice.

I'm still getting flakey test runs where sometimes they all pass and sometimes there are a few, random failures…

Example 1:

App › a lesson page (spacePlacement=spaceBeforeOutput) › lesson with `you can` › doesn't accept excess chars

    expect(element).toHaveTextContent()

    Expected element to have text content:
      You
    Received:
      can

App › a lesson page (spacePlacement=spaceOff) › lesson with `you can` › accepts first word but detect misstroke in second word when input at once

    expect(element).toHaveTextContent()

    Expected element to have text content:
      can
    Received:
      lead

Example 2:

App › a lesson page (spacePlacement=spaceBeforeOutput) › lesson with `silent` › doesn't count 'less{bs}' as misstroke
    expect(element).toHaveValue( less)
    Expected the element to have value:
      less
    Received:
      s

App › a lesson page (spacePlacement=spaceBeforeOutput) › lesson with `you can` › doesn't accept excess chars
    expect(element).toHaveTextContent()
    Expected element to have text content:
      You
    Received:
      can

App › a lesson page (spacePlacement=spaceOff) › lesson with `you can` › doesn't accept excess chars
    expect(element).toHaveTextContent()

    Expected element to have text content:
      You
    Received:
      can

Example 3:

App › a lesson page (spacePlacement=spaceOff) › lesson with `too bad` › accepts inputs at once

    expect(element).toHaveValue( too bads)

    Expected the element to have value:
      too bads
    Received:
      s

It seems like running yarn test --runInBand passes more frequently, so maybe we need to review test set up/tear down and make sure things like document.cookie = "batchUpdate=1"; is set in all the right places? I don't really have good ideas on what's going on with the flakey tests.

The test suite is quite long (about 30–50sec) so we might also split out the slow UI test suite from unit tests in package.json like:

"test": "react-scripts test --transformIgnorePatterns \"node_modules/(?!d3-array)/\" --env=jsdom",
"test:unit": "yarn test --testMatch \"**/*.test.{js,ts}\"",
"test:ui": "yarn test --testMatch \"**/*.test.{jsx,tsx}\"",

na2hiro commented 5 months ago

Thanks for the detailed report. I think I fixed them. Could you test it on your environment?

typing a character in userEvent.type() can take more than 16ms and the input chunks can be broken up more than I intended (e.g. after inputting "too bad", it takes more than 16ms for "s" to be input). I could confirm that was the case by adding print at the beginning of updateBufferSingle and run it in my old macbook on which entire test takes over 1 minutes (I've been testing on M1 mac where it takes only 25s). I made the 16ms interval configurable and bumped to 32ms only in tests, then my old macbook also started to pass all cases. You can try to bump further if it's still flaky.
A couple of test cases were incorrect in case of exact matching ("too bads" in a buffer should proceed anyway). It was passing coincidentally probably because multiple input chunks were processed in the same updateBufferSingle call. I added awaiting inside typeIn so multiple typeIn calls won't be processed in the same batch

If this looks good and there are no more issues, I wonder if I can remove the cookie hack and switch to new behavior.

didoesdigital commented 4 months ago

Nice work @na2hiro! I'm in transit now so I can't confirm it works on my machine yet but I think it's looking good.

I think we can deploy it when I'm back with the cookie hack and get some dev/steno folk to test it in production with manually set cookies. That could help catch any possible timing issues on slower devices or different systems.

Great work on this! Thank you so much.

na2hiro commented 4 months ago

Awesome, that's a good idea! Thank you for patiently reviewing my code. I'd be happy to follow up with issues if found any

didoesdigital commented 4 months ago

@na2hiro Thanks for all your work on this! I've rebased the branch on master for a clean history and force pushed the rebased branch so that PR was up to date when I merged so GitHub marked it merged properly this time.

I've deployed the change so now when anyone has the cookie batchUpdate set to 1 and hard refreshes Typey Type, they will see that lessons won't progress when a word has extra, incorrect letters 🎉

I'll let you know if I get any reports of issues with the new behaviour and otherwise, after it's been live a while, I will remove the cookie hack and switch to new behaviour. Thanks again!

didoesdigital commented 2 months ago

@na2hiro I've switched the default behaviour in https://github.com/didoesdigital/typey-type/pull/178 — if there are no reports of issues after that's been live for a bit, I'll remove all the conditional behaviour and tests. Thanks again for this improvement!