Open ftyers opened 5 years ago
I think the way to do this is:
form:analysis
pairs from the tagged corpus A_u
form:analysis
pairs from the tagged corpus A_w
A_u
from the unweighted analyser grn.LR.hfst
, A_f
A_w
and A_f
Commit 54bea94 adds support for generating the weights from a tagged corpus.
The remainder is to make the Makefile nicer, this diff does the work, but we should somehow integrate it nicely.
-.deps/$(LANG1).LR.hfst: .deps/$(LANG1).LR.seg.hfst .deps/$(LANG1).mor.twol.hfst
- hfst-compose-intersect -1 .deps/$(LANG1).LR.seg.hfst -2 .deps/$(LANG1).mor.twol.hfst -o $@
+.deps/$(LANG1).LR.hfst: .deps/$(LANG1).LR.seg.hfst .deps/$(LANG1).mor.twol.hfst
+ hfst-compose-intersect -1 .deps/$(LANG1).LR.seg.hfst -2 .deps/$(LANG1).mor.twol.hfst -o .deps/$(LANG1).LR.unweighted.hfst
+ hfst-subtract -1 .deps/$(LANG1).LR.unweighted.hfst -2 .deps/$(LANG1).weights.noweight.hfst -o .deps/$(LANG1).LR.unweighted.subtr.hfst
+ hfst-union -1 .deps/$(LANG1).LR.unweighted.subtr.hfst -2 .deps/$(LANG1).weights.hfst -o $@
TODO:
Anything not in the corpus should have weight 1.0 or higher. Anything in the corpus should have weight between 0 and 1.