Klebert-Engineering / deep-spell-9

Neural Spellcheck, Autocomplete and Fuzzy-match for SQLite FTS5 🤖
MIT License
2 stars 0 forks source link

Refine Training Grammar #11

Closed josephbirkner closed 7 years ago

josephbirkner commented 7 years ago

Currently, the Deep Spell training grammar uses a plain random permutation of all available token categories. This means, that rather uncommon logical combinations such as State-Road or Country-Road are widely generated. The grammar should be refined to exclude such combinations:

us-address := random-permute-min1(
  COUNTRY [p=0.1],
  STATE [p=0.2],
  city-zip-road [p=0.8],
)
city-zip-road := random-permute-min1(
  city-zip [p=1.0],
  ROAD [p=0.4]
)
city-zip := random-permute-min1(
  CITY [p=0.7],
  ZIP [p=0.3]
)
josephbirkner commented 7 years ago

The refined training grammar should also have options to add random corruption to the training samples (Normal distribution of error rate). Error probabilities should be distributable over error classes:

josephbirkner commented 7 years ago

As per directions from Fabian, the ZIP category will be removed for now. Therefore, the new grammar will look as follows:

us-address := random-permute-min1(
  COUNTRY [p=0.1],
  STATE [p=0.2],
  city-road [p=0.8],
)
city-road := random-permute-min1(
  CITY [p=1.0],
  ROAD [p=0.4]
)
josephbirkner commented 7 years ago

96e6845