Refine Training Grammar

Klebert-Engineering / deep-spell-9

Neural Spellcheck, Autocomplete and Fuzzy-match for SQLite FTS5 🤖

MIT License

2 stars 0 forks source link

Refine Training Grammar #11

Closed josephbirkner closed 7 years ago

josephbirkner commented 7 years ago

Currently, the Deep Spell training grammar uses a plain random permutation of all available token categories. This means, that rather uncommon logical combinations such as State-Road or Country-Road are widely generated. The grammar should be refined to exclude such combinations:

us-address := random-permute-min1(
  COUNTRY [p=0.1],
  STATE [p=0.2],
  city-zip-road [p=0.8],
)
city-zip-road := random-permute-min1(
  city-zip [p=1.0],
  ROAD [p=0.4]
)
city-zip := random-permute-min1(
  CITY [p=0.7],
  ZIP [p=0.3]
)

josephbirkner commented 7 years ago

The refined training grammar should also have options to add random corruption to the training samples (Normal distribution of error rate). ~~Error probabilities should be distributable over error classes~~:

Switches
Deletions
Insertions
Substitutions

josephbirkner commented 7 years ago

As per directions from Fabian, the ZIP category will be removed for now. Therefore, the new grammar will look as follows:

us-address := random-permute-min1(
  COUNTRY [p=0.1],
  STATE [p=0.2],
  city-road [p=0.8],
)
city-road := random-permute-min1(
  CITY [p=1.0],
  ROAD [p=0.4]
)

josephbirkner commented 7 years ago

96e6845