SBECK-github / Date-Manip

Other
10 stars 11 forks source link

Updates for language files #42

Open davidh2075 opened 1 year ago

davidh2075 commented 1 year ago

This is a follow-on from #41. I initially posted the comments below to that closed issue, which I couldn't re-open. They really warrant a separate issue, as #41 is related only to the Polish language. I'll delete the original comments on #41 after posting in this issue.

I've attached 8 files named \<language>_new.pm.txt. They are the merge of changes required to parse dates on MacOS Monterey 12.5.1 and Oracle Linux Server 7.9. I appended .txt to permit attachment. There are some parse failures I note below.

I've also attached two crude scripts I used to test all the language configuration files, date-manip-test-macos.sh.txt and date-manip-test-linux.sh.txt. They aren't ideal tests, as they will fail to parse a date if, for example, either a day abbreviation or a month abbreviation isn't recognised. When all date time elements are recognised, they parse, which is how I used them. I ran both scripts with the final merged language files.

I'm not a speaker of any of these languages, so I simply updated the files to achieve successful parsing, using the strings output by date(1).

Do you know yet what the version number will be that incorporates changes? I'll look for it downstream.

Explaining the Romanian Tuesday day_name update

I added an entry to the Romanian Tuesday array, the last element below, which both Linux and macOS output as the full day name: ['marți', 'marti', 'marþi', 'marţi'], The ț in the first element is U+021B, Latin small letter t with comma below. The ţ in the last element is U+0163, Latin small letter t with cedilla. Both are accepted as Tuesday in Romanian, e.g. see reverso translation of both.

Parse failures

I'm including this section just FYI. None of this is causing me a problem. Just stuff I noticed.

Russian on macOS

The standard output of date on macOS fails to parse. It's due to the presence of "r. " and the parentheses around the TZ name in the output below. The blank line is the failed parse. Using a format to eliminate the "r. " and parentheses results in a successful parse.

% LANG=ru_RU date
четверг,  6 октября 2022 г. 18:34:30 (AEDT)
% LANG=ru_RU date | perl -I/Users/USERNAME/Downloads/Date-Manip-SBECK-github/lib/ -MDate::Manip -lpe 'Date_Init("Language=Russian", "DateFormat=non-US"); $_=UnixDate(ParseDate($_), "%Y%m%d %T")'

% LANG=ru_RU date +"%A, %e %B %Y %T %Z"
четверг,  6 октября 2022 18:35:53 AEDT
% LANG=ru_RU date +"%A, %e %B %Y %T %Z" | perl -I/Users/USERNAME/Downloads/Date-Manip-SBECK-github/lib/ -MDate::Manip -lpe 'Date_Init("Language=Russian", "DateFormat=non-US"); $_=UnixDate(ParseDate($_), "%Y%m%d %T")'
20221006 18:36:10
%

German, Italian and Norwegian on Linux

German

The default format doesn’t parse, with or without .UTF-8:

$ LANG=de_DE.UTF-8 date --date="2022-01-03 11:00:00"
Mo 3. Jan 11:00:00 AEST 2022

It appears to be due to the period after the day of the month. This does parse, with or without .UTF-8:

$ LANG=de_DE.UTF-8 date --date="2022-01-03 11:00:00" +"%a %e %b %Y %T %Z"
Mo  3 Jan 2022 11:00:00 AEST
Italian

The default format doesn’t parse, with or without .UTF-8:

$ LANG=it_IT.UTF-8 date --date="2022-01-03 11:00:00"
lun  3 gen 2022, 11.00.00, AEST

It appears to the due to the commas, and I didn't check whether the periods in the time contribute. This does parse, with or without .UTF-8:

$ LANG=it_IT.UTF-8 date --date="2022-01-03 11:00:00" +"%a %e %b %Y %T %Z"
lun  3 gen 2022 11:00:00 AEST
Norwegian

The default format doesn’t parse:

$ LANG=no_NO date --date="2022-01-03 11:00:00"
ma. 03. jan. 11:00:00 +1000 2022

It seems to be due to the period after the day of the month.

UTF-8 seems to output the default LANG (en_AU):

$ LANG=no_NO.UTF-8 date --date="2022-01-03 11:00:00"
Mon Jan  3 11:00:00 AEST 2022

This does parse without .UTF-8:

$ LANG=no_NO date --date="2022-01-03 11:00:00" +"%a %e %b %Y %T %Z"
ma.  3 jan. 2022 11:00:00 AEST

This doesn’t parse with .UTF-8, unless you use English, i.e. Date_Init("Language=English", "DateFormat=non-US"):

$ LANG=no_NO.UTF-8 date --date="2022-01-03 11:00:00" +"%a %e %b %Y %T %Z"
Mon  3 Jan 2022 11:00:00 AEST

Files

finnish_new.pm.txt french_new.pm.txt norwegian_new.pm.txt polish_new.pm.txt portugue_new.pm.txt romanian_new.pm.txt russian_new.pm.txt turkish_new.pm.txt date-manip-test-macos.sh.txt date-manip-test-linux.sh.txt

davidh2075 commented 1 year ago

Update: a Russian-speaking colleague provided me the update below to the fields structure in russian.pm to try to fix the parsing of the standard date(1) output due to the "r. " element. It didn't work. I guess it may be because Date::Manip isn't expecting a "years" delta expression in a date-time. In any case, It still required removal of both the "r. " and the parentheses around the TZ name to parse.

% diff russian.pm russian_macos_linux.pm
50,51c50,51
<     ['г', 'г.', 'гд', 'год', 'лет', 'лет', 'года'],
<     ['мес', 'мес.', 'месяц', 'месяцев'],
---
>     ['г', 'гд', 'год', 'лет', 'лет', 'года'],
>     ['мес', 'месяц', 'месяцев'],
53c53
<     ['д', 'д.', 'день', 'дней', 'дня'],
---
>     ['д', 'день', 'дней', 'дня'],
55,56c55,56
<     ['мн', 'мин', 'мин.', 'минута', 'минут'],
<     ['с', 'с.', 'сек', 'сек.', 'секунда', 'секунд'],
---
>     ['мн', 'мин', 'минута', 'минут'],
>     ['с', 'сек', 'секунда', 'секунд'],
%
SBECK-github commented 1 year ago

Okay, I have looked over the 8 language files and approve all of them, so they're committed. They will be included in the next release (6.91 expected 12/01/2022).

I am still investigating the necessity of the russian changes (removing r.)

davidh2075 commented 1 year ago

Thanks, Sullivan

regards - David

On 2 Nov 2022, at 7:21 pm, Sullivan Beck @.***> wrote:

 Okay, I have looked over the 8 language files and approve all of them, so they're committed. They will be included in the next release (6.91 expected 12/01/2022).

I am still investigating the necessity of the russian changes (removing r.)

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.

SBECK-github commented 1 year ago

I have added support for some simple language specific parsing rules which seem to resolve all of the issues you described parsing the default output from the date command. Feel free to confirm. This will be in the next release (in just a couple weeks).

davidh2075 commented 1 year ago

Sullivan,

I tested the changes on both MacOS and Linux. Russian default date format now works OK on MacOS, which means everything I tested works on MacOS.

On Linux:

Everything else I tested still works on Linux.

regards - David