didoesdigital / steno-dictionaries

Di's Plover-theory stenography dictionaries used by Typey Type for Stenographers.
GNU General Public License v2.0
83 stars 19 forks source link

Issue with stroke for "regain" #173

Closed paulfioravanti closed 4 years ago

paulfioravanti commented 4 years ago

In the current dict.json file, there are the following strokes for "regain", all of which are valid in Plover release weekly-v4.0.0.dev8+66.g685bd33:

"RAOE/TKPWAEUPB": "regain",
"RE/TKPWAEUP": "regain",
"RE/TKPWAEUPB": "regain",

In the top-10000-project-gutenberg-words.json dictionary, the stroke used for "regain" is:

https://github.com/didoesdigital/steno-dictionaries/blob/cdcf87902e2859a535b8f469968522a9a0cb3468/dictionaries/top-10000-project-gutenberg-words.json#L8084

"RE/TKPWAEUP": "regain" looks like a mis-stroke to me, as it's missing the B stroke for the final "n" sound in the word.

Given that:

  1. All other strokes in the dictionaries that contain "regain" use the B stroke:
"RAOE/TKPWAEUPBD": "regained",
"RAOE/TKPWAEUPBG": "regaining",
"RAOE/TKPWAEUPBS": "regains",
"RE/TKPWAEUPB/-D": "regained",
"RE/TKPWAEUPB/-G": "regaining",
"RE/TKPWAEUPB/-S": "regains",
"RE/TKPWAEUPBS": "regains",
  1. The stroke for "regained" used in top-10000-project-gutenberg-words.json also uses the B stroke:

https://github.com/didoesdigital/steno-dictionaries/blob/cdcf87902e2859a535b8f469968522a9a0cb3468/dictionaries/top-10000-project-gutenberg-words.json#L7584

I would like to propose:

didoesdigital commented 4 years ago

Thanks @paulfioravanti, sounds good to me!

I did a recent batch of fixes like that using various regexes to identify issues, but did not find this one. Are there other entries matching ".*P": ".*n", that look like misstrokes missing the B?

paulfioravanti commented 4 years ago

@didoesdigital Running a ".*P": ".*n" regex over the dictionaries in current master got me the following entries. For entries that look like mis-strokes to me (and which should probably be moved to bad-habits.json), I've added what I think the more "correct" strokes should be in inline comments (even though JSON wouldn't officially support them!):

top-10000-project-gutenberg-words.json

"OEP": "open",
"HAP": "happen",
"K-FP": "kitchen",
"WEP": "weapon",
"WRUP": "whereupon",
"THRUP": "thereupon",
"TKAR/WEUP": "Darwin", // TKAR/WEUPB
"RE/TKPWAEUP": "regain", // RE/TKPWAEUPB

top-1000-words.json

"OEP": "open",
"SEUFP": "situation",

dict.json

"AS/PREUP": "aspirin", // AS/PREUPB
"EFRT/SKWREP": "estrogen", // EFRT/SKWREPB
"EUR/A*P": "Iran", // EUR/APB
"HAOEPL/TKPWHROEB/*EUP": "hemoglobin", // HAOEPL/TKPWHROEB/*EUPB
"HAOEURD/SKWREP": "hydrogen", // HAOEURD/SKWREPB
"HAP": "happen",
"HEL/AES/K-FP": "Hell's Kitchen",
"HO*EUFP": "heavy chain",
"HRO*EUFP": "light chain",
"K-FP": "kitchen",
"KAFT/AOEURP": "cast-iron", // KAFT/AOEURPB
"KALS/TOEPB/*EUP": "calcitonin", // KALS/TOEPB/*EUPB
"KHA*EURP": "Chairperson",
"KHAEURP": "chairperson",
"KO/KHAEURP": "co-chairperson",
"KWEP": "chemical weapon",
"OEP": "open",
"PERBG/TKAP": "Percodan", // PERBG/TKAPB
"PH*FP": "Ms. Chairperson",
"PHO*EU/SEUP": "myosin", // PHAOEU/S*EUPB (maybe?)
"PHR*FP": "Mr. Chairperson",
"PHR-FP": "Mr. Chairman",
"PRAOUP": "prudent person",
"PWA/HRAOP": "balloon", // PWA/HRAOPB
"PWRAEBG/SKWROUP": "breakdown", // PWRAEBG/SKWROUPB
"RAOE/OEP": "reopen", 
"RE/OEP": "reopen",
"RE/TKPWAEUP": "regain", // RE/TKPWAEUPB
"RO*EP": "repolarization",
"ROEB/TUS/*EUP": "Robitussin", // ROEB/TUS/*EUPB
"SEUT/S*EP": "citizen", // SEUT/SEPB
"SHO/TKPWREP": "Sjogren", // SHO/TKPWREPB
"SKWROPB/SO*P": "Johnson", //  SKWROPB/SOPB
"SPAOEUD/*ER/PHA*P": "Spiderman", // SPAOEUD/*ER/PHA*PB
"STKAP": "accident happen",
"T*FP": "teaspoon",
"THRUP": "thereupon",
"TKAR/WEUP": "Darwin", // TKAR/WEUPB
"TKEP": "deposition",
"TKO*EP": "depolarization",
"TKOL/TPEUP": "dolphin", // TKOL/TPEUPB
"TKWEP": "deadly weapon",
"TKWREP": "dangerous weapon",
"TPH/PHEUP": "in my opinion",
"TPH/PHUP": "in my humble opinion",
"TPHR*EUFP": "literature in",
"WAOEUPBTD/TPHUP": "wind up in",
"WEP": "weapon",
"WRUP": "whereupon",

What do you think?

didoesdigital commented 4 years ago

Amazing work, @paulfioravanti! 👏 These improvements all look great.

For "myosin", I'd go with PHO*EU/SEUPB as a new condensed-strokes.json entry instead of PHAOEU/S*EUPB because it uses the prefix entry, "PHO*EU": "{myo^}" and the base word "SEUPB": "sin",.

Let's move the misstrokes from dict.json into bad-habits.json, add the 1 condense-strokes.json entry for "myosin", and update the entries in top-10000-project-gutenberg-words.json.

paulfioravanti commented 4 years ago

Re-looking at this, I realised that we haven't considered the other side of this equation yet: the ".*[^P]B": ".*n" pattern, where a stroke that is meant to represent a word ending in "n" has an ending B stroke but not a P directly before it to make the "n" sound. I ran this regex over the current dictionaries and came up with the following. Like before, I've added what I think the more "correct" strokes should be in inline comments:

dict.json

"A/TPER/*EUPB/TPHR-RB": "aferrin", // stroke not in Plover; it only has A/TPER/*EUPB, which would need to be added to dict.json 
"AOEBG/HREUB/RAEUGS/TPHR-RB": "equilibriation", // stroke not in Plover; it only has AOEBG/HREUB/RAEUGS, which would need to be added to dict.json
"AOUR/PAOEB": "European", // AOUR/PAOEPB
"APL/KAEUS/*EUB": "amikacin", // APL/KAEUS/*EUPB
"EUFB": "I have been",
"EUPB/TERB": "intern", // EUPB/TERPB
"EUPBS/HREUB": "insulin", // EUPBS/HREUB
"EUPL/TKPWHROB": "immunoglobulin",
"HAB": "has been",
"HRAO*EURB": "librarian", // HRAO*EURPB
"HREUB/*ER/TAEURB": "libertarian", // HREUB/*ER/TAEURPB
"KPHAOED/KWRAB": "comedian", // KPHAOED/KWRAPB
"KWRAFB": "I can't have been",
"OBGS/SKWREB": "oxygen", // OBGS/SKWREPB
"P*R/A*B": "Puerto Rican", // P*R/A*PB
"PA*T/SKWREB": "pathogen", // PAT/SKWREPB
"PHAOUPB/TKPWHROB": "immunoglobulin",
"PHARBG/TWAEUPB/TPHR-RB": "Mark Twain", // stroke not in Plover; this could be removed
"PHARPBLG/*EUB": "margin", // PHARPBLG/*EUPB
"PHULT/SRAO*EUB": "multivitamin", // PHULT/SRAO*EUPB; also PHULT/SRAO*EUBS for "multivitamins" could be moved to `bad-habits.json`
"PWA/HRAOB": "balloon", // PWA/HRAOPB
"PWU/SUL/TPAPB/TPHR-RB": "busulfan", // stroke not in Plover; it only has PWU/SUL/TPAPB, which would need to be added to dict.json
"RE/HRERB": "relearn", // Plover also only has RE/HRERB, yet "learn" is HRERPB. Strange...
"REPBLG/PH*EB": "regimen", // REPBLG/PH*EPB
"S*EURT/PHAOEUS/*EUB": "erythromycin", // S*EURT/PHAO*EUPB (?)
"SAOEULG/SPORB": "cyclosporin", // SAOEULG/SPORPB
"SER/TOEPB/*EUB": "serotonin", // SER/TOEPB/EUPB
"SHEB": "she been",
"SR-B": "have been",
"SRAPBG/PHAOEUS/*EUB": "vancomycin", // SRAPBG/PHAOEUS/*EUPB
"SREURPBLG/*EUB": "virgin", // SREURPBLG/*EUPB
"SRUB": "have you been",
"TAT/HREUB": "Tatlin", // TAT/HREUPB
"TE/RAPB/TPHR-RB": "Tehran", // stroke not in Plover so could be removed; Best substitute stroke is TE/RAPB
"TEUBG/SEUPB/TPHR-RB": "Tikosyn", // stroke not in Plover so could be removed; Best substitute stroke is TEUBG/SEUPB
"TKOBGS/RAOUB/SEUB": "doxorubicin", // TKOBGS/RAOUB/SEUPB
"TKPHAO*ERB": "deficiency in",
"TKPWAPL/TKPWHROB": "gamma-globulin",
"TKPWHREUS/REUB": "glycerin", // TKPWHREUS/REUPB
"TKPWOE/TPHAD/SKWRO*/TROEP/*EUPB/TPHR-RB": "Gonadotropin", // stroke not in Plover; it only has TKPWOE/TPHAD/SKWRO*/TROEP/*EUPB, which would need to be added to dict.json
"TPEPB/TOEUB": "phenytoin", // TPEPB/TOEUPB
"TPEUPBG/*L/STAOEPB/TPHR-RB": "Finkelstein", // stroke not in Plover; it only has TPEUPBG/*L/STAOEPB, which would need to be added to dict.json
"TPHO*RBG/HART/SOEGS/TPHR-RB": "New York Heart Association", // stroke not in Plover; it only has TPHO*RBG/HART/SOEGS, which would need to be added to dict.json
"TPHR*ERB": "pleasure in",
"TROEF/TPHROBGS/SEUPB/TPHR-RB": "trovafloxacin", // stroke not in Plover; it only has TROEF/TPHROBGS/SEUPB, which would need to be added to dict.json
"WHATS/TKPW-G/OB": "what's going on",

There's a lot to take in here! What are you thoughts on this set of words?

didoesdigital commented 4 years ago

Great stuff @paulfioravanti !

Let's add to dict.json:

"A/TPER/*EUPB": "aferrin",
"AOEBG/HREUB/RAEUGS": "equilibriation",
"PWU/SUL/TPAPB": "busulfan",
"TKPWOE/TPHAD/SKWRO*/TROEP/*EUPB": "Gonadotropin",
"TROEF/TPHROBGS/SEUPB": "trovafloxacin",
"TPEUPBG/*L/STAOEPB": "Finkelstein",

Let's add to condensed-strokes.json:

"RE/HRERPB": "relearn",
"WHATS/TKPW-G/OPB": "what's going on",

Let's move these to bad-habits.json:

"AOUR/PAOEB": "European",
"APL/KAEUS/*EUB": "amikacin",
"EUPB/TERB": "intern",
"EUPBS/HREUB": "insulin",
"HRAO*EURB": "librarian",
"HREUB/*ER/TAEURB": "libertarian",
"KPHAOED/KWRAB": "comedian",
"OBGS/SKWREB": "oxygen",
"P*R/A*B": "Puerto Rican",
"PA*T/SKWREB": "pathogen",
"PHARBG/TWAEUPB/TPHR-RB": "Mark Twain",
"PHARPBLG/*EUB": "margin",
"PHULT/SRAO*EUB": "multivitamin",
"PWA/HRAOB": "balloon",
"RE/HRERB": "relearn",
"REPBLG/PH*EB": "regimen",
"S*EURT/PHAOEUS/*EUB": "erythromycin",
"S*EURT/PHAO*EUPB": "erythromycin", // if anything, probably should be "azithromycin"
"SAOEULG/SPORB": "cyclosporin",
"SER/TOEPB/*EUB": "serotonin",
"SRAPBG/PHAOEUS/*EUB": "vancomycin",
"SREURPBLG/*EUB": "virgin",
"TAT/HREUB": "Tatlin",
"TE/RAPB/TPHR-RB": "Tehran",
"TEUBG/SEUPB/TPHR-RB": "Tikosyn",
"TKOBGS/RAOUB/SEUB": "doxorubicin",
"TKPWHREUS/REUB": "glycerin",
"TKPWOE/TPHAD/SKWRO*/TROEP/*EUPB/TPHR-RB": "Gonadotropin",
"TPEPB/TOEUB": "phenytoin",
"TPEUPBG/*L/STAOEPB/TPHR-RB": "Finkelstein",
"TPHO*RBG/HART/SOEGS/TPHR-RB": "New York Heart Association",
"TROEF/TPHROBGS/SEUPB/TPHR-RB": "trovafloxacin",
"WHATS/TKPW-G/OB": "what's going on",

These seem like briefs we should keep as they are:

"EUFB": "I have been",
"HAB": "has been",
"KWRAFB": "I can't have been",
"I can't have been",
"SHEB": "she been",
"SR-B": "have been",
"SRUB": "have you been",
"TKPHAO*ERB": "deficiency in",
"TPHR*ERB": "pleasure in",

We should keep these as they are:

"EUPL/TKPWHROB": "immunoglobulin", // the B is for the 'b' sound here similar to "EUPL/TKPWHROB/HREUPB": "immunoglobulin",
"PHAOUPB/TKPWHROB": "immunoglobulin", // ditto
"TKPWAPL/TKPWHROB": "gamma-globulin", // ditto