fst_alignment for ITN - Githubissues

Hi,

nemo_text_processing/fst_alignment/alignment.py works fine for TN case when we are aligning input to output words:

inp string: |1994|
out string: |tokens { date { year: "nineteen ninety four" } }|
inp indices: [0:4] out indices: [23:43]
in: |1994| out: |nineteen ninety four|

But in ITN case, the alignment seems to be broken as the input words that are inverse normalized are mapped to empty strings:

inp string: |nineteen ninety four|
out string: |tokens { date { year: "1994" preserve_order: true } }|
inp indices: [0:8] out indices: [23:23]
in: |nineteen| out: ||
inp indices: [9:15] out indices: [25:25]
in: |ninety| out: ||
inp indices: [16:20] out indices: [26:26]
in: |four| out: ||

Is there a way to get below form in ITN case?

in: |nineteen ninety four| out: |1994|

Thank you very much.

NVIDIA / NeMo-text-processing

fst_alignment for ITN #45