gamburg / margin

Lightweight markup designed for an open mind
https://margin.love
MIT License
190 stars 9 forks source link

Mixing spaces and tabs—allowed? #5

Open vlmutolo opened 4 years ago

vlmutolo commented 4 years ago

I'm opening this to better track thinking about how to (if at all) mix spaces and tabs.

The simplest case I can see is the following:

\tA
   B

The first is a tab indentation, and the second is three spaces. Is B a child of A or are they on the same indentation?

vlmutolo commented 4 years ago

Idea: maybe the best way to solve this is to let the spaces/tabs ratio be a user-configurable parameter with a default of 4:1. Also, count any decorative characters as one space.

So, for most people, the default will work as they expect.

The snippet

\tA
    B

would produce two same-level items, while the snippet

\tA
     B

would produce an item and a child item.

And for those who configure their editors to tabs=3 spaces, they’ll have a setting to change the parser.

The only major downside I can think of is possibly encountering a system that doesn’t support user options in parsers (does SublimeText?). I don’t know if these exist. And writing grammars that have parameters is a little strange and definitely more complex.

xkortex commented 4 years ago

IMHO, don't let it be user-configurable. I'm not sure how you'd enable that, other than in-file #pragma style parser-directives, which is almost always the wrong choice.

Also purely opinion, do what YAML does and ban \t tabs.

Why does YAML forbid tabs?

Tabs have been outlawed since they are treated differently by different editors and tools. And since indentation is so critical to proper interpretation of YAML, this issue is just too tricky to even attempt. Indeed Guido van Rossum of Python has acknowledged that allowing TABs in Python source is a headache for many people and that were he to design Python again, he would forbid them.

If you do decide to allow tabs, I would just throw an exception if they are mixed. That's the price one pays for such a habit :p.

vlmutolo commented 4 years ago

Yeah, the more I think about it, the more I think it would be best if mixing weren’t allowed at all. The whole document should probably just be spaces.

gamburg commented 4 years ago

How about tabs and spaces can't be mixed under a single direct parent?

In the case of @vlmutolo's example above:

\tA
   B

A and B would both be children of the implied Document Parent, since spaces and tabs can't be mixed between siblings.

A more illustrative example of this idea:

A [child of the Document Parent]
\tB [child of A] 
\t\tC [child of B]
\t\t   Child of C
\t\t   Child of C
\t\tD [child of B]

\tE [child of A] 
   F [child of A]
   \tChild of F
         Child of A

This looks insanely complicated, but I think (at least in plain text) it would be fairly intuitive. Sure, it'd be up to the thinker to mind their tabs & spaces, but only if they wanted their Margin parsed.

xkortex commented 4 years ago

Yeah, that could work, albeit at a bit of extra complexity. Each TLI (top level item, direct children of the document, regex ^\S.*$) would look for its first child and then set a flag for indentation type. To check for alternate-whitespace siblings, you could split the document up and process each TLI as a separate sub-document, and assert that you don't mix them within each subdoc. This could potentially give you speed ups if you have a multithreaded implementation (e.g. golang/rust), even outside this issue.

OTOH, checking for mixed tabs and spaces is just a matter of two regexes.

You could possibly "cheat" by converting \t to 2 or 4 spaces iff there are no tabs and spaces in the same gutter, eg \t\t⎵⎵ ⎵\t etc would be forbidden. Though that gives unreliable behavior on siblings that alternate tabs/spaces.

My $0.02 is allow only one per document until you get more of the spec hammered out, then return to tackle it, if there's demand for it. But since the behavior of most editors out there is to either alias the key as either \t OR insert N spaces, I really don't see this use case showing up that frequently.

vlmutolo commented 4 years ago

@xkortex makes a very good point. We’re probably overengineering the spec in this case, since few people would be mixing tabs and spaces. Parsers should probably just allow only one or the other until it seems like this is actually a problem.

In practice, this looks like the following high-level algorithm for a parser: