Closed josephbirkner closed 7 years ago
The refined training grammar should also have options to add random corruption to the training samples (Normal distribution of error rate). Error probabilities should be distributable over error classes:
As per directions from Fabian, the ZIP category will be removed for now. Therefore, the new grammar will look as follows:
us-address := random-permute-min1(
COUNTRY [p=0.1],
STATE [p=0.2],
city-road [p=0.8],
)
city-road := random-permute-min1(
CITY [p=1.0],
ROAD [p=0.4]
)
96e6845
Currently, the Deep Spell training grammar uses a plain random permutation of all available token categories. This means, that rather uncommon logical combinations such as State-Road or Country-Road are widely generated. The grammar should be refined to exclude such combinations: