getpelican / pelican

Static site generator that supports Markdown and reST syntax. Powered by Python.
https://getpelican.com
GNU Affero General Public License v3.0
12.47k stars 1.81k forks source link

dotclear import : encoding issues if accents in the slug #164

Closed nsteinmetz closed 13 years ago

nsteinmetz commented 13 years ago

Hi,

Just had some issue with the dotclear import around encoding as soon as there are accents in the slug used for the filename. As soon as I remove accents, no troubles.

Platform : OS X Lion 10.7.1, using default terminal.

Example :

output/Django-:-limit_choices_to-pour-présenter-un-sous-ensemble-des-données-d-un-modèle.rst
Traceback (most recent call last):
 File "/usr/local/bin/pelican-import", line 248, in <module>
main(input_type, args.input, args.output, dircat=args.dircat)
File "/usr/local/bin/pelican-import", line 217, in main
fields2pelican(fields, output_path, dircat=dircat)
File "/usr/local/bin/pelican-import", line 198, in fields2pelican
html_filename))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 102: ordinal not in range(128)
almet commented 13 years ago

That's probably an easy one to fix, just go to the importer code (tools/pelican-import) and a) check that the file is encoded in the right format (utf-8) b) replace the calls to the builtin "open" with calls to "codecs.open" with the right format of the file c) replace the simple strings by unicode strings when needed.

Thanks for the report, give it a try, it should be really simple to fix. If not, I can have a deeper look later.

nsteinmetz commented 13 years ago

No problem, I'll try to fix it in the coming hours or days :)

Another issue is when the slug is having double quotes. Even if escaped, there are a few issues there (1 case in my import) :

"2008/11/04/Modélisation-d-une-relation-\"Je-suis-le-contact/ami-de-...\""
almet commented 13 years ago

Awesome!

About the second issue, I don't get what is the problem when reading your description.

What are you expecting with this input? What are you having instead?

nsteinmetz commented 13 years ago

I have to test deeper if the issue is about quotes or the "/". don't remember well the bug seen last night. Will check this one later too :-)

nsteinmetz commented 13 years ago

Fixed with the small fix I provided ; cf pull request https://github.com/ametaireau/pelican/pull/165