giellalt / lang-sms

Finite state and Constraint Grammar based analysers and proofing tools, and language resources for the Skolt Sami language
https://giellalt.uit.no
GNU Lesser General Public License v3.0
4 stars 0 forks source link

transcriptor-date-digit2text.lexc produces a working digit2text but not a text2digit analysis in hfst and xfst ( #12

Closed albbas closed 9 years ago

albbas commented 9 years ago

This issue was created automatically with bugzilla2github

Bugzilla Bug 1966

Date: 2015-01-31T20:37:43+01:00 From: Jack Rueter <> To: Sjur Nørstebø Moshagen <>

Last updated: 2015-03-16T13:21:34+01:00

albbas commented 9 years ago

Comment 10046

Date: 2015-01-31 20:37:43 +0100 From: Jack Rueter <>

On work machine Maverick 10.9.5 src/transcriptions/transcriptor-date-digit2text.lexc has been written and provides working analyzers for both hfst:

hfst-lookup src/transcriptions/transcriptor-date-digit2text.filtered.lookup.hfst hfst-lookup: warning: It is not possible to perform fast lookups with OpenFST, std arc, tropical semiring format automata. Using HFST basic transducer format and performing slow lookups

1.2. 1.2. vuõssmõs peeiʹv täʹlvvmannust 0.000000

and xfst:

lookup src/transcriptions/transcriptor-date-digit2text.filtered.lookup.xfst

LEXICON LOOK-UP

1.2. 1.2. vuõssmõs peeiʹv täʹlvvmannust

WHEN THE REVERSE ANALYSIS IS ATTEMPTED NEITHER WORKS

hfst-lookup src/transcriptions/transcriptor-date-text2digit.filtered.lookup.hfst hfst-lookup: warning: It is not possible to perform fast lookups with OpenFST, std arc, tropical semiring format automata. Using HFST basic transducer format and performing slow lookups

vuõssmõs peeiʹv täʹlvvmannust vuõssmõs peeiʹv täʹlvvmannust vuõssmõs peeiʹv täʹlvvmannust+? inf

vuõssmõs peeiʹv täʹlvvmannust vuõssmõs peeiʹv täʹlvvmannust vuõssmõs peeiʹv täʹlvvmannust+? inf

lookup src/transcriptions/transcriptor-date-text2digit.filtered.lookup.xfst

LEXICON LOOK-UP

vuõssmõs peeiʹv täʹlvvmannust vuõssmõs peeiʹv täʹlvvmannust vuõssmõs peeiʹv täʹlvvmannust +?

vuõssmõs peeiʹv täʹlvvmannust vuõssmõs peeiʹv täʹlvvmannust vuõssmõs peeiʹv täʹlvvmannust +?

Here the first input was a direct copy paste from the results of the previous output. The second input was typed in.

albbas commented 9 years ago

Comment 10055

Date: 2015-02-03 07:04:08 +0100 From: Jack Rueter <>

This seems to be an issue with the placement of +Use/NG. This tag does not seem to function.

albbas commented 9 years ago

Comment 10129

Date: 2015-02-11 01:37:47 +0100 From: Jack Rueter <>

The present expected input and results:

~/main/langs/sms$ hfst-lookup src/transcriptions/transcriptor-date-digit2text.filtered.lookup.hfst hfst-lookup: warning: It is not possible to perform fast lookups with OpenFST, std arc, tropical semiring format automata. Using HFST basic transducer format and performing slow lookups

3.2. 3.2. täʹlvvmannu kuälmad peeiʹv 0,000000

BUT

~/main/langs/sms$ hfst-lookup src/transcriptions/transcriptor-date-text2digit.filtered.lookup.hfst hfst-lookup: warning: It is not possible to perform fast lookups with OpenFST, std arc, tropical semiring format automata. Using HFST basic transducer format and performing slow lookups

täʹlvvmannu kuälmad peeiʹv täʹlvvmannu kuälmad peeiʹv 3.2. 0,000000

kuälmad peeiʹv täʹlvvmannust kuälmad peeiʹv täʹlvvmannust kuälmad peeiʹv täʹlvvmannust+? inf

The continuation lexica for the failing string order end with: LEXICON X_SECOND :+Use/NG # ;

albbas commented 9 years ago

Comment 10149

Date: 2015-02-12 19:21:23 +0100 From: Sjur Nørstebø Moshagen <>

The problem was the entries in the following lexicon:

LEXICON MONTH-SECOND 1@U.MONTH.1@:ođđeeʹjjmannust% @U.MONTH.1@ X_SECOND ;

Every entry is ending with an obligatory space, but at the same time continuing directly to #. That confused the lookup algorithm.

I removed the '% ' thing just before the flag diacritics, and now I get:

kuälmad peeiʹv täʹlvvmannust kuälmad peeiʹv täʹlvvmannust 3.2

vuõssmõs peeiʹv täʹlvvmannust vuõssmõs peeiʹv täʹlvvmannust 1.2

It seems the error was in the lexc code, and not associated with the +Use/NG tag :)

The fix is committed in rev. 107 331.