apertium / apertium-grn

Apertium linguistic data for Guarani
GNU General Public License v3.0
2 stars 2 forks source link

Use tagged corpus to weight analyser #11

Open ftyers opened 5 years ago

ftyers commented 5 years ago

Anything not in the corpus should have weight 1.0 or higher. Anything in the corpus should have weight between 0 and 1.

ftyers commented 5 years ago

I think the way to do this is:

ftyers commented 5 years ago

Commit 54bea94 adds support for generating the weights from a tagged corpus.

The remainder is to make the Makefile nicer, this diff does the work, but we should somehow integrate it nicely.

-.deps/$(LANG1).LR.hfst: .deps/$(LANG1).LR.seg.hfst .deps/$(LANG1).mor.twol.hfst
-       hfst-compose-intersect -1 .deps/$(LANG1).LR.seg.hfst -2 .deps/$(LANG1).mor.twol.hfst -o $@

+.deps/$(LANG1).LR.hfst: .deps/$(LANG1).LR.seg.hfst .deps/$(LANG1).mor.twol.hfst
+       hfst-compose-intersect -1 .deps/$(LANG1).LR.seg.hfst -2 .deps/$(LANG1).mor.twol.hfst -o .deps/$(LANG1).LR.unweighted.hfst
+       hfst-subtract -1 .deps/$(LANG1).LR.unweighted.hfst -2 .deps/$(LANG1).weights.noweight.hfst -o .deps/$(LANG1).LR.unweighted.subtr.hfst
+       hfst-union -1 .deps/$(LANG1).LR.unweighted.subtr.hfst -2 .deps/$(LANG1).weights.hfst  -o $@

TODO: