UTF-8 Title to Slug Conversion

cloudhead / toto

the 10 second blog-engine for hackers

MIT License

1.49k stars 244 forks source link

UTF-8 Title to Slug Conversion #29

Open ahmozkya opened 14 years ago

ahmozkya commented 14 years ago

This code does not make the correct utf-8 conversion.

#...

def slugize
    self.downcase.gsub(/&/, 'and').gsub(/\s+/, '-').gsub(/[^a-z0-9-]/, '')
end

#...

The following link will probably help. http://code.djangoproject.com/browser/django/tags/releases/1.1.1/django/contrib/admin/media/js/urlify.js

ixti commented 13 years ago

As this feature will include expansive RegExp work, I believe this must be implemented as "pluggable" option, e.g.:

set :utf8, true

So this will enable extra-worker for those who write non-ascii blogs :))

The second thing I guess we need to agree is how we are going to work with such titles. The easiest way is to implement one-way conversion, e.g. files would be in ascii format, like: 2011-08-13-proverka.txt, while titles will allow non-ascii chars without any problems, e.g. проверка. This way is easy, but looks more like a workround, so the best behavior (I believe) is to allow two-way conversion, so URL /2011/08/13/proverka will try to get file 2011-08-13-проверка.txt and then 2011-08-13-proverka.txt

sorin-ionescu commented 12 years ago

I also have this problem. It returns 404 because it cannot find articles with non ASCII characters in the title since the slugize method removes them, and def path at line 274 returns the slug. We have to deslug the path in def go.

sorin-ionescu commented 12 years ago

Something like bellow should work.

def article route
  path = self.articles.select do |article|
    File.basename(article, ".#{self[:ext]}").slugize.eql? route.join('-')
  end.last || File.join(Paths[:articles], "#{route.join('-')}.#{self[:ext]}")
  Article.new(path, @config).load
end

Unfortunately, there is an encoding issue. File.basename(article, ".#{self[:ext]}").slugize fails to equal route.join('-') in certain cases. Even though the string can look identical to the eye, slugize can generates different slugs.