amyreese / markdown-pp

Preprocessor for Markdown files to generate a table of contents and other documentation needs
MIT License
309 stars 68 forks source link

File Encoding should be given in command line #16

Open ati-ozgur opened 8 years ago

ati-ozgur commented 8 years ago

If I use !TOC and non-ascii chracters. I get following error.

  File "/home/atilla/anaconda/bin/markdown-pp", line 42, in <module>
    MarkdownPP.MarkdownPP(input=mdpp, output=md, modules=modules)
  File "/home/atilla/anaconda/lib/python2.7/site-packages/MarkdownPP/MarkdownPP.py", line 30, in __init__
    pp.process()
  File "/home/atilla/anaconda/lib/python2.7/site-packages/MarkdownPP/Processor.py", line 49, in process
    transforms = module.transform(self.data)
  File "/home/atilla/anaconda/lib/python2.7/site-packages/MarkdownPP/Modules/TableOfContents.py", line 119, in transform
    TableOfContents.clean_title(title)).lower()
  File "/home/atilla/anaconda/lib/python2.7/re.py", line 155, in sub
    return _compile(pattern, flags).sub(repl, string, count)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 3: ordinal not in range(128)
ebenhoch commented 6 years ago

I just came across about the same issue. Any suggestions how to resolve this? Kind regards, Peter

amyreese commented 6 years ago

This is a bug due to the (poor) way that Python 2 deals with unicode strings. If you can run this with Python 3.2 or newer, then it should work as expected. If someone wants to improve the unicode handling to work with Python 2, I'd be happy to review a diff.