Open eyaler opened 3 years ago
we would be happy to do the PR if the @avian2 is interested
Hi, It would be great if you prepare a patch of proposed changes. Or at least have a clear table from char to char so it will be easier to review. Please make sure that the "special" uncommon symbols should be clearly marked as such, so people will not be confused. I am using this translation for many texts in non-hebrew displays and so far it is working quite well. Regards, Alon
Hi
@alonbl, if you can review @eyaler 's pull request I would be happy to accept it (since some of the proposed changes touch your changes in https://github.com/avian2/unidecode/commit/81f938d9419f4b651a089a0d809bd1a0566b1329). I don't know Hebrew and can't comment on the suggested changes.
i am not sure what are 05f5, 05f6, 05f7 as they are not part of unicode afaict
If the codepoints are undefined in Unicode, please set them to None
in the transliteration tables.
graphically sof-pasuk looks like : but for nlp tasks would be more useful to use "." or even ". " as this is the meaning of the punctuation.
I trust your judgment in choosing the best compromise here.
Thanks!
On Mon, 2 Aug 2021 at 20:32 Tomaž Šolc @.***> wrote:
Hi
@alonbl https://github.com/alonbl, if you can review @eyaler https://github.com/eyaler 's pull request I would be happy to accept it (since some of the proposed changes touch your changes in 81f938d https://github.com/avian2/unidecode/commit/81f938d9419f4b651a089a0d809bd1a0566b1329). I don't know Hebrew and can't comment on the suggested changes.
I will be glad to, once there is a pull request :) Maybe I am missing something in GitHub interface?
@alonbl didn't PR yet. hope to get to it soon. will tag. some points are a matter of view/use case/agenda and there really is no clear right choice. if you are interested alon, we can discuss. thanks guys!
@alonbl please find table you asked for: https://docs.google.com/spreadsheets/d/1fvQtyDxiVbz4Yp2FY1fSvZ9qVugo2KKC_yX8LofAUGU
Thanks!
I created a patch with all that I could understand, as you did not provide edit permission we will sync on code, let's narrow it down, see #68.
I would like to ask for @alonbl feedback/greenlight before preparing my PR. I am interested in addressing several issues I see in the current Hebrew transliteration: