bhollis / maruku

A pure-Ruby Markdown-superset interpreter (Official Repo).
MIT License
500 stars 80 forks source link

The regex to match OL doesn't work for UTF-8 #98

Closed rhapsodyn closed 11 years ago

rhapsodyn commented 11 years ago

The regex /^[ ]{0,1}\d+\..*\w+/ i find in type_detection.rb:57 not working well for UTF-8.

Here's the test:

# encoding: utf-8
puts "1. 测试" =~ /^[ ]{0,1}\d+\..*\w+/
puts "1. test" =~ /^[ ]{0,1}\d+\..*\w+/

outputs nil and 0

milesto commented 11 years ago

This issue was in older version gem. The \w behaves only with ascii character in ruby 1.9 against all unicode character in ruby 1.8. Now we should use the pattern : [\w\P{ASCII}]

In newer version gem in mdline.rb

return :olist          if self =~ /^([ ]{0,3}|\t)\d+\.\s+.*/
puts "1. 测试" =~ /^([ ]{0,3}|\t)\d+\.\s+.*/
puts "1. test" =~ /^([ ]{0,3}|\t)\d+\.\s+.*/

These tests work.

rhapsodyn commented 11 years ago

Updated to the latest version, works like a charm