NVIDIA / NeMo-text-processing

NeMo text processing for ASR and TTS
https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/text_normalization/wfst/wfst_text_normalization.html
Apache License 2.0
242 stars 76 forks source link

IT_TN_Fixes #179

Closed zoobereq closed 1 month ago

zoobereq commented 1 month ago

What does this PR do ?

The % sign is now accepted by Italian TN see this issue. The implementation passes all pyTests but fails a handful of Sparrowhawk tests. Since the current fix adds a single mapping, to nemo_text_processing/text_normalization/it/data/measure/measurements.tsv, it is unlikely to be related to Sparrowhawk failures.

Before your PR is "Ready for review"

Pre checks:

PR Type:

If you haven't finished some of the above items you can still open "Draft" PR.

bonham79 commented 1 month ago

Failed jenkins tests are related to pr


(1359 durations < 0.005s hidden.  Use -vv to show these durations.)

=========================== short test summary info ============================

FAILED tests/nemo_text_processing/de/test_decimal.py::TestDecimal::test_norm_7_eins_zwei_komma_drei - AssertionError: assert '1,2,3' == 'eins , zwei komma drei'

  - eins , zwei komma drei

  + 1,2,3

FAILED tests/nemo_text_processing/de/test_decimal.py::TestDecimal::test_norm_8_eins_komma_zwei_drei_komma_vier - AssertionError: assert '1,2,3,4' == 'eins komma z...ei komma vier'

  - eins komma zwei , drei komma vier

  + 1,2,3,4

FAILED tests/nemo_text_processing/de/test_decimal.py::TestDecimal::test_norm_9_eins_zwei_komma_drei_vier_komma_f_nf - AssertionError: assert '1,2,3,4,5' == 'eins , zwei ...er komma fünf'

  - eins , zwei komma drei , vier komma fünf

  + 1,2,3,4,5

FAILED tests/nemo_text_processing/de/test_electronic.py::TestElectronic::test_norm_08_b_r_e_t_t_s_p_i_e_l_v_e_r_s_a_n_d_punkt_de_ - AssertionError: assert 'b r e t t s ...unkt de punkt' == 'b r e t t s ... d punkt de .'

  - b r e t t s p i e l v e r s a n d punkt de .

  ?                                            ^

  + b r e t t s p i e l v e r s a n d punkt de punkt

  ?                                            ^^^^^

FAILED tests/nemo_text_processing/de/test_electronic.py::TestElectronic::test_norm_09_w_w_w_punkt_z_u_b_e_r_e_k_punkt_net_ - AssertionError: assert 'w w w punkt ...dot net punkt' == 'w w w punkt ...k punkt net .'

  - w w w punkt z u b e r e k punkt net .

  ?                                ------

  + w w w punkt z u b e r e k dot net punkt

  ?                          ++++++++

FAILED tests/nemo_text_processing/de/test_electronic.py::TestElectronic::test_norm_12_at_z_u_c_k - AssertionError: assert '@zuck' == 'at z u c k'

  - at z u c k

  + @zuck

FAILED tests/nemo_text_processing/de/test_electronic.py::TestElectronic::test_norm_13_at_z_o_o_b_e_r_e_q - AssertionError: assert '@zoobereq' == 'at z o o b e r e q'

  - at z o o b e r e q

  + @zoobereq

FAILED tests/nemo_text_processing/de/test_electronic.py::TestElectronic::test_norm_14_at_z_u_b_e_r_e_k_punkt_n_e_t - AssertionError: assert '@zuberek.net' == 'at z u b e r e k punkt n e t'

  - at z u b e r e k punkt n e t

  + @zuberek.net

FAILED tests/nemo_text_processing/de/test_electronic.py::TestElectronic::test_norm_15_at_w_e_z_y_r_eins_neun_acht_sechs - AssertionError: assert '@wezyr1986' == 'at w e z y r...un acht sechs'

  - at w e z y r eins neun acht sechs

  + @wezyr1986

FAILED tests/nemo_text_processing/de/test_time.py::TestTime::test_norm_26_zwei_uhr_drei_ig - AssertionError: assert '2.30 Uhr' == 'zwei uhr dreißig'

  - zwei uhr dreißig

  + 2.30 Uhr

FAILED tests/nemo_text_processing/de/test_time.py::TestTime::test_norm_27_zwei_uhr_drei_ig - AssertionError: assert '02.30 Uhr' == 'zwei uhr dreißig'

  - zwei uhr dreißig

  + 02.30 Uhr

================== 11 failed, 590 passed in 88.72s (0:01:28) ===================

script returned exit code 1
tbartley94 commented 1 month ago

@zoobereq since all your prs are going off the same source branch, they have similar codebases and they can't be differentiated well. I need to get each one merged at a time so let's get this one in.