g-plane / raffia

CSS, Sass, SCSS and Less parser, all in one.
http://raffia-play.vercel.app/
MIT License
49 stars 5 forks source link

SCSS/SASS syntax trees differ for interchangeable code #7

Open yiffyrusdev opened 2 weeks ago

yiffyrusdev commented 2 weeks ago

Hello! Thanks for the amazing crate!

I'm exploring the Raffia Playground, looking forward to use the raffia for stylesheet parsing.

I'm not really into sass tabbed syntax, however, as I get it from here, the following SASS code should be parsed as 3 top-level QualifiedRules:

.rule1
  color: red
  &-smth
    color: blue
.rule2
  color: blue
.rule3
  color: red

At the mean time, it does seem to be parsed as a single one:

Stylesheet
> statements : [ 1 element ]
  > QualifiedRule
    > selector : SelectorList
      > selectors : [ 1 element ]
    > block : SimpleBlock
      >statements : [ 4 elements ]

Where the block does contain the declaration and rest of QualifiedRule.

The analogical code in SCSS syntax is parsed as expected:

.rule1{
  color: red;
  &-smth {
    color: blue;
  }
}
.rule2{
  color: blue;
}
.rule3{
  color: red;
}

Stylesheet
> statements : [ 3 elements ]
  > QualifiedRule
  > QualifiedRule
  > QualifiedRule

As far as I can tell, those code samples should form identical syntax trees (*except of spans) identical by structure syntax trees, aren't they?

g-plane commented 2 weeks ago

I haven't looked deeply, but I think this may be a bug of parser. (Handling indentation is not easy.)

yiffyrusdev commented 2 weeks ago

Thanks for the reply! Yeah, indents could be tricky. Agreed.

I'd like to take a look as well, maybe I'll have some luck.

Where should I start? ::parser::sass?

g-plane commented 2 weeks ago

Tokenizer.

yiffyrusdev commented 1 week ago

Greetings!

Tokenizer.

Yup, that's correct. I'm going to submit a PR, but first:

Tokenizer does generate a single Dedent token, no matter how far we "go up" - and the currently parsed block terminates by that token. So one can have:

a
  b
    c
      d
        e
    f <-- 2 blocks up

And the f inside d's block. Because that's how - as far as I can tell - parsing works.

There are two approaches I've made so far:

  1. Generate as many Dedent tokens as required to close every block which should be closed
    • Minimal influence on internal APIs
    • Requires another Vec<TokenWithSpan> allocation, because arbitrary Dedent has nothing to consume from the source, so I have to keep them for bump
  2. Have u16 payload for the Dedent token structure, which holds amount of blocks to close
    • Sensible influence on internal APIs, especially parsing

Both shall also allocate Vec<u16> for the indentation levels and introduce the ErrorKind::InconsistendIndentation. The latter I've failed to found in the SASS specs, however SASS playground (which I believe is the dart-sass) does it: "if the block's indentation is 7 - it is 7".

If [1] looks good so far, I'll submit the PR.

Hey wait a moment. I could do the [1] with a Vec<u32> allocation.

g-plane commented 1 week ago

I think both solutions are OK.