Python-Markdown / markdown

A Python implementation of John Gruber’s Markdown with Extension support.
https://python-markdown.github.io/
BSD 3-Clause "New" or "Revised" License
3.76k stars 862 forks source link

My tabs are converted into spaces #36

Closed flying-sheep closed 13 years ago

flying-sheep commented 13 years ago

the code →→code (where is a tab) is expanded into this:

    code

why aren’t my tabs retained?

waylan commented 13 years ago

All whitespace is normalized to spaces prior to parsing a document based on the value assigned to tab_length (default is 4). The tabs are not retained. If you want the tabs to be represented differently, you will need to assign the appropriate value to tab_length. If you would like your tabs restored, I'd suggest running your own postprocessor which replaces the appropriate number of spaces with a tab.

I should note that if you are using the current code in the repo, tab_length is a keyword argument on the markdown class and the wrapper function (do markdown.markdown(some_text, tab_length=8)). However, in previous versions of markdown it was a global variable markdown.TAB_LENGTH which you would have to override.

flying-sheep commented 13 years ago

i don’t quite understand why they are converted at all. To normalize it? then tabs would be more sensible (1 tab = 1level of indentation, ¾ less bytes used)

Either way, it should support tab_length=None to use tabs instead of spaces. In the Future, we can use CSS3’s tab-size, too.

waylan commented 13 years ago

First, this is a Python implementation of the original perl implementation by John Gruber. It is noteworthy that John's implementation replaces tabs with spaces as well. We are copying that behavior and are not likely to change unless he does (very unlikely).

That said, here are my responses to your specific comments:

If you could guarantee that every document author consistently always used either one of spaces or tabs, there would be no need for normalizing whitespace. However, that would be an unrealistic expectation. Especially on documents edited by multiple people (wiki pages?). By normalizing whitespace to use all spaces, we eliminate a lot of potential edge case bugs (in those inconsistent documents).

True, we could normalize to tabs, but there are actually a number of peculiarities to the Markdown syntax which makes spaces easier to work with (we often find tab_length - 1 in the code for example). Which brings up another problem; if you set tab_length=None, and the parser finds a string of spaces, how many tabs are represented there?

flying-sheep commented 13 years ago

tab_length = None should only mean “Tabs aren’t to be normalized”, but now i understand that the code doesn’t separate between syntax and output. Instead of normalizing all the tabs to x spaces, then interpreting x-space-indented blocks as codeblocks, whe should adhere to the specification and interpret blocks which are either indented by x spaces or 1 tab as code blocks. individual blocks indented with both spaces and tabs are an abomination ;)

we could do this easily by converting the first tab of each line into x spaces (while retaining the following ones) and then using the current code-block-finding code. this would even work for aforementioned abominations. afaik there aren’t nested code blocks in markdown.

PS: i only found tab_length being used in blockprocessors.py, where else is it used?

flying-sheep commented 11 years ago

so, what’s going on?

i really want my code to be retained as it is and not changed. e.g. when coding in genie, converting tabs to spaces introduces syntax errors (you have to specify in this languages how many spaces one indentation level should have. without this specification, it’s 1 tab per indentation level)

everything inside a code block shouldn’t be touched.

flying-sheep commented 11 years ago

if you look at the diff of that pull request ignoring whitespace (?w=1), you can see that the tests were not altered in any other way than whitespace, and still run flawlessly:

joewreschnig commented 10 years ago

I am trying to put code from a Makefile into a Markdown preformatted block and the normalization makes the resulting code invalid, since tabs are semantic in Makefiles.

It is impossible to fix this in a post-processing step because I also often include code snippets - in Make any other languages - that have eight or more spaces in a row.

In fact the Markdown "standard" says:

Regular Markdown syntax is not processed within code blocks.

I'd count whitespace folding/normalize as "regular Markdown syntax". It shouldn't be happening - code blocks should deindent and HTML entity escape, and nothing else.

bdrewery commented 9 years ago

This is simply wrong. It generates invalid output for Makefiles. Tabs should only be expanded for Markdown syntax, not inside code blocks.