gettalong / kramdown

kramdown is a fast, pure Ruby Markdown superset converter, using a strict syntax definition and supporting several common extensions.
http://kramdown.gettalong.org
Other
1.72k stars 274 forks source link

Incorrect output to a file in PowerShell #725

Closed docsimpo closed 3 years ago

docsimpo commented 3 years ago

I use the below command to output the converted html to a file:

kramdown tang.md > tang-kramdown.html

The file "tang.md" is written in Chinese. If I run the above command in Command Prompt, the output file is correct like this:

<h1 id="section">唐诗三百首</h1>

<ol>
  <li>李白</li>
  <li>杜甫</li>
  <li>白居易</li>
</ol>

But if I run the above command in PowerShell, the output file become like this:

<h1 id="section">鍞愯瘲涓夌櫨棣?/h1>

<ol>
  <li>鏉庣櫧</li>
  <li>鏉滅敨</li>
  <li>鐧藉眳鏄?/li>
</ol>

I also tried the below commands, but none of them can make the output file correct in PowerShell:

kramdown tang.md | Out-File tang-kramdown.html
kramdown tang.md | Out-File tang-kramdown.html -encoding utf8

By the way, these command can output correct content to console in PowerShell. As long as I use > or Out-File to assign a output file, the contents in the output file gets incorrect ones.

What is wrong with it?

gettalong commented 3 years ago

I don't use PowerShell so I can't really help you here. If it works in cmd and not PowerShell, there may be something different with regards to the used encoding. Try running ruby -e 'p Encoding.default_external' in both shells. If there is a difference, then you found the problem.

You can use a custom encoding using ruby -E ISO-8859-1, i.e. the -E option:

$ ruby -e 'p Encoding.default_external'
#<Encoding:UTF-8>
$ ruby -E ISO-8859-1 -e 'p Encoding.default_external'
#<Encoding:ISO-8859-1>

Combine the -E option with the -S option to run kramdown like this: ruby -E ISO-8859-1 -S kramdown tang.md (naturally substitute "ISO-8859-1" with the correct encoding).

docsimpo commented 3 years ago

I run ruby -e 'p Encoding.default_external' in both CMD and PowerShell. The feedbacks show that their encodings are same: UTF-8.

EncodingCheck

gettalong commented 3 years ago

I'm sorry but if the encoding is the same, I don't think I can help you because I do not have Windows.

Maybe PowerShell does some encoding conversion when outputting to a file? Or maybe you can find out more by determining how the output is incorrect.

docsimpo commented 3 years ago

@gettalong Thank you very much for all of your replies!

Just like what you guess, PowerShell always decodes output from external programs before further processing.

Just for your information, this is the same question I asked on PowerShell of Github:
Correct output in CMD, but incorrect output to a file in PowerShell

Related issue: Don’t parse the pipeline as text when it is directed from an EXE to another EXE or file. Keep the bytes as-is.

It is too complicated for me to output kramdown converted text to a file using PowerShell now. Wish they can give a better solution in the future.