adzap / timeliness

Fast date/time parsing for the control freak.
http://github.com/adzap/timeliness
MIT License
224 stars 28 forks source link

Fix the month detection for lowercase and long abbreviated month names. Fix ddd and mmm formats to make them fit more locales. #10

Open Exoth opened 11 years ago

Exoth commented 11 years ago

Let's try the usual case:

>> I18n.locale = :en
:en

>> "18 October 2013".to_date
Fri, 18 Oct 2013

>> "18 october 2013".to_date
Fri, 18 Oct 2013

>> "18 oct 2013".to_date
Fri, 18 Oct 2013

>> "18 Oct 2013".to_date
Fri, 18 Oct 2013

So far everything is ok. But in fact there are a lot of locales, which have lowercase and long abbreviated month names, like 'octobre' and 'oct.' in French.

>> I18n.locale = :fr
:fr

>> "18 Octobre 2013".to_date
ArgumentError: invalid date
  from /home/exoth/.rvm/gems/ruby-1.9.3-p448@jetradar/bundler/gems/timeliness-9c8161f4ab08/lib/timeliness/core_ext/string.rb:12:in `new'
  from /home/exoth/.rvm/gems/ruby-1.9.3-p448@jetradar/bundler/gems/timeliness-9c8161f4ab08/lib/timeliness/core_ext/string.rb:12:in `to_date'
  from (irb):39
  from /home/exoth/.rvm/gems/ruby-1.9.3-p448@jetradar/gems/railties-3.2.14/lib/rails/commands/console.rb:47:in `start'
  from /home/exoth/.rvm/gems/ruby-1.9.3-p448@jetradar/gems/railties-3.2.14/lib/rails/commands/console.rb:8:in `start'
  from /home/exoth/.rvm/gems/ruby-1.9.3-p448@jetradar/gems/railties-3.2.14/lib/rails/commands.rb:41:in `<top (required)>'
  from /home/exoth/projects/jetradar/script/rails:6:in `require'
  from /home/exoth/projects/jetradar/script/rails:6:in `<top (required)>'
  from -e:1:in `load'
  from -e:1:in `<main>'

>> "18 octobre 2013".to_date
ArgumentError: invalid date
  from /home/exoth/.rvm/gems/ruby-1.9.3-p448@jetradar/bundler/gems/timeliness-9c8161f4ab08/lib/timeliness/core_ext/string.rb:12:in `new'
  from /home/exoth/.rvm/gems/ruby-1.9.3-p448@jetradar/bundler/gems/timeliness-9c8161f4ab08/lib/timeliness/core_ext/string.rb:12:in `to_date'
  from (irb):41
  from /home/exoth/.rvm/gems/ruby-1.9.3-p448@jetradar/gems/railties-3.2.14/lib/rails/commands/console.rb:47:in `start'
  from /home/exoth/.rvm/gems/ruby-1.9.3-p448@jetradar/gems/railties-3.2.14/lib/rails/commands/console.rb:8:in `start'
  from /home/exoth/.rvm/gems/ruby-1.9.3-p448@jetradar/gems/railties-3.2.14/lib/rails/commands.rb:41:in `<top (required)>'
  from /home/exoth/projects/jetradar/script/rails:6:in `require'
  from /home/exoth/projects/jetradar/script/rails:6:in `<top (required)>'
  from -e:1:in `load'
  from -e:1:in `<main>'

>> "18 oct. 2013".to_date
NoMethodError: undefined method `map' for nil:NilClass
  from /home/exoth/.rvm/gems/ruby-1.9.3-p448@jetradar/bundler/gems/timeliness-9c8161f4ab08/lib/timeliness/core_ext/string.rb:11:in `to_date'
  from (irb):43
  from /home/exoth/.rvm/gems/ruby-1.9.3-p448@jetradar/gems/railties-3.2.14/lib/rails/commands/console.rb:47:in `start'
  from /home/exoth/.rvm/gems/ruby-1.9.3-p448@jetradar/gems/railties-3.2.14/lib/rails/commands/console.rb:8:in `start'
  from /home/exoth/.rvm/gems/ruby-1.9.3-p448@jetradar/gems/railties-3.2.14/lib/rails/commands.rb:41:in `<top (required)>'
  from /home/exoth/projects/jetradar/script/rails:6:in `require'
  from /home/exoth/projects/jetradar/script/rails:6:in `<top (required)>'
  from -e:1:in `load'
  from -e:1:in `<main>'

>> "18 Oct. 2013".to_date
NoMethodError: undefined method `map' for nil:NilClass
  from /home/exoth/.rvm/gems/ruby-1.9.3-p448@jetradar/bundler/gems/timeliness-9c8161f4ab08/lib/timeliness/core_ext/string.rb:11:in `to_date'
  from (irb):45
  from /home/exoth/.rvm/gems/ruby-1.9.3-p448@jetradar/gems/railties-3.2.14/lib/rails/commands/console.rb:47:in `start'
  from /home/exoth/.rvm/gems/ruby-1.9.3-p448@jetradar/gems/railties-3.2.14/lib/rails/commands/console.rb:8:in `start'
  from /home/exoth/.rvm/gems/ruby-1.9.3-p448@jetradar/gems/railties-3.2.14/lib/rails/commands.rb:41:in `<top (required)>'
  from /home/exoth/projects/jetradar/script/rails:6:in `require'
  from /home/exoth/projects/jetradar/script/rails:6:in `<top (required)>'
  from -e:1:in `load'
  from -e:1:in `<main>'

So that pull request fixes the month detection behavior to properly handle such situations.

And the second part is about regexp used for ddd and mmm formats.

Here's a simple code, checking how well the regexps perform against translations from all the locales found in the rails-i18n gem:

def check key, regexp
  bad = []
  locales = Dir['/home/exoth/.rvm/gems/ruby-1.9.3-p448@jetradar/gems/rails-i18n-0.7.4/rails/locale/*']
  locales.each do |file| 
    q = YAML.load(File.read(file))
    bad << q.keys.first if q.values.first["date"][key].compact.any?{|i| i !~ regexp} 
  end
  puts "#{bad.count}/#{locales.count} miss the regexp: #{bad.join(', ')}."
end

>> check 'month_names', /^\w{3,9}$/
check 'month_names', /^\w{3,9}$/
59/80 miss the regexp: es-AR, th, es-VE, el, is, az, de, tr, he, cy, zh-CN, fr-CH, lv, ca, hi-IN, pl, et, zh-HK, fr, mk, hi, hr, sk, uk, lt, es-PE, ro, pt, fa, pt-BR, lo, ne, es-419, kn, gl, ar, hu, fi, cs, es, sw, zh-TW, ja, es-MX, es-CL, eo, es-CO, bg, sr, or, de-AT, ru, wo, fr-CA, ko, vi, bn, de-CH, mn.

>> check 'abbr_month_names', /^\w{3,9}$/
43/80 miss the regexp: th, el, is, az, de, tr, he, zh-CN, fr-CH, lv, ca, uz, pl, et, zh-HK, fr, mk, hi, hr, sk, uk, fa, lo, ne, gl, ar, hu, fi, cs, zh-TW, ja, eo, bg, sr, or, de-AT, ru, fr-CA, ko, vi, bn, de-CH, mn.

>> check 'day_names', /^\w{3,9}$/
66/80 miss the regexp: es-AR, th, es-VE, el, sl, is, az, de, tr, he, cy, sv, zh-CN, id, lv, nb, uz, hi-IN, pl, et, zh-HK, mk, hi, hr, sk, uk, lt, es-PE, it, ro, nn, bs, pt, it-CH, fa, pt-BR, da, lo, ne, tl, es-419, kn, gl, ar, hu, fi, eu, cs, es, zh-TW, ja, es-MX, es-CL, eo, es-CO, bg, sr, or, de-AT, ru, wo, ko, vi, bn, de-CH, mn.

>> check 'abbr_day_names', /^\w{3,9}$/
check 'abbr_day_names', /^\w{3,9}$/
61/80 miss the regexp: es-AR, th, es-VE, el, sl, is, az, de, tr, he, sv, zh-CN, rm, lv, ca, nb, uz, hi-IN, pl, et, zh-HK, mk, hi, hr, sk, uk, lt, es-PE, ro, nn, bs, pt, fa, pt-BR, da, lo, ne, es-419, kn, ar, hu, fi, cs, es, sw, zh-TW, ja, es-MX, es-CL, eo, es-CO, bg, sr, or, de-AT, ru, ko, vi, bn, de-CH, mn.

So there are quite a lot of misses. So far these are the best results I've managed to get by making minor changes to the regexps:

>> check 'month_names', /^[[[:alnum:]]\.]{2,11}$/
11/80 miss the regexp: th, hi-IN, hi, ne, kn, sw, or, wo, vi, bn, mn.

>> check 'abbr_month_names', /^[[[:alnum:]]\.]{2,11}$/
8/80 miss the regexp: ca, hi, lo, ne, or, vi, bn, mn.

>> check 'day_names', /^[[[:alnum:]]\.]{1,14}$/
13/80 miss the regexp: th, az, cy, id, hi-IN, hi, uk, fa, ne, kn, or, vi, bn.

>> check 'abbr_day_names', /^[[[:alnum:]]\.]{1,14}$/
8/80 miss the regexp: hi-IN, hi, lo, ne, kn, or, vi, bn.

Much better already, though there are still some locales left, but they mostly involve multiword names, and this probably can't be fixed just by changing the format regexps. The way I see is to use the list of month/day names of the current locale at the stage of format processing.

Anyway, the pull request includes the changes in formats as specified above.

Exoth commented 10 years ago

Now when this solution couldn't match the Thai February month name translation I decided to go even further and defined mmm like this:

months = I18n.available_locales.flat_map do |locale| 
  I18n.t('date.month_names', locale: locale) + I18n.t('date.abbr_month_names', locale: locale)
end 
months.compact.uniq.map { |month| Regexp.escape(month) }.join('|')

and enabled case insensitive search. This way I'm guaranteed to match any possible month translation. But regexp is much bigger now and I don't know how it affects the performance. Anyway tell me if you find this solution suitable for the timeliness gem and I'll make a pull request.