cssinjs / istf-spec

Interoperable Style Transfer Format [DRAFT]
247 stars 8 forks source link

First draft #17

Closed kof closed 7 years ago

kof commented 7 years ago

This is mainly to discuss the details of it and raise all concerns. cc @geelen

kof commented 7 years ago

Updated the spec.

kof commented 7 years ago

Updated the spec

kof commented 7 years ago

@tabatkins btw. this format could be also fixing the interoperability between less, sass and cssinjs!

kof commented 7 years ago

@tabatkins @geelen do you have more thoughts about this PR?

geelen commented 7 years ago

This is starting to shape up! I noticed one thing missing—dynamic values. For styled components, we need the ability for parts of the CSS to be injected based on props or theme variables, and if the intermediate format doesn't support that then we'd not be able to publish cross-compatible components—we'd have to publish full Styled Components.

So, I think it's worth considering areas where dynamic injections might occur, and a way to invert control so that injections can be evaluated as a snippet is executed. It could be as simple as something like:

.foo-123456 {
  color: ${ props => props.red ? 'red' : 'blue' };
  margin: 0.25em;

  ${ props => props.big && css`
    font-size: 1.25em;
    margin: 0.5em;
  }
}
[
  [RULE_START, 1],
    [RULE_NAME, 'foo'],
    [PROPERTY, 'color'],
    [VALUE, [ DYNAMIC, 0 ]],
    [DYNAMIC, 1],
  [RULE_END]
]

Then the specification also defines evaluation contexts something like:

function toCSS( rulesToSelectorsFn, ...dynamicFns ) {}

By default rulesToSelectorsFn could be rule => `._${rule}_${hash(rule)}` but that could be overridden... somehow. Dynamic interpolations would then just be referenced by their index into the dynamicFns array. When something like our babel transform generates this intermediate format, we wire up this array to make sure things are in the right order.

geelen commented 7 years ago

Might be good to get @philpl's thoughts here now too

kof commented 7 years ago

Definitely, I wanted just to cover basics first.

geelen commented 7 years ago

Then my only feedback is whether any rules should have any nested values (i.e. FUNCTION) at all. Would there be benefits to keeping everything flat? Could we collapse [SELECTOR, '.bla', [FUNCTION, ':matches', ':hover', ':active']] into simply [SELECTOR, '.bla:matches(:hover, :active)']? Alternatively:

[
  [RULE_START, 1],
    [COMPOUND_SELECTOR_START],
      [SELECTOR, '.foo'],
      [FUNCTION_SELECTOR, ':matches'],
        [SELECTOR, ':hover'],
        [SELECTOR, ':focus'],
      [FUNCTION_SELECTOR_END],
    [COMPOUND_SELECTOR_END],
    [PROPERTY, 'color'],
    [VALUE, 'red']
  [RULE_END]  
]

That way any SELECTOR could be a RULE and namespacing would apply.

geelen commented 7 years ago

In general I think it would be good if we could get the definition down to a flat list of 2-tuples. That could give us some optimisations down the line, like a binary format:

* 1 byte for marker
* 3 bytes for either:
  - first bit 0 means data fits in 23 bytes (i.e. RULE_START 1)
  - first bit 1 means next 23 bytes declare n bytes of argument (i.e. SELECTOR 'body')
* n bytes of argument
[
  [RULE_START, 1],         // 00 01
    [SELECTOR, 'body'],    // 03 84 62 6F 64 79 (has 4 bytes of data, 'body' in UTF8)
    [PROPERTY, 'color'],   // 0A 85 63 6F 6C 6F 72 (has 5 bytes of data, 'color' in UTF8)
    [VALUE, 'red']         // 0B 83 72 65 64 (has 3 bytes of data, 'red' in UTF8)
  [RULE_END]               // 01 00
]

I mean, it's just a thought at this point, I don't know how valuable this would become down the track, but it feels like an avenue we wouldn't want to close off prematurely

kof commented 7 years ago

I like your COMPOUND_SELECTOR and all flat idea, will update the spec!

kof commented 7 years ago

I mean, it's just a thought at this point, I don't know how valuable this would become down the track, but it feels like an avenue we wouldn't want to close off prematurely

So binary CSS huh) I have no idea where it could be useful, but lets just keep it in mind for now.

kof commented 7 years ago

Btw. if we are able to express everything in 2-tuples, we can express everything in one array, right?

[
  RULE_START, 1,
    SELECTOR, 'body',
    PROPERTY, 'color',
    VALUE, 'red',
  RULE_END
]
kitten commented 7 years ago

@kof @geelen Hiya :wave:

I'd love to make this a reality, especially since for styled-components v3, I'll attempt to create a core library for preprocessing CSS and extracting static CSS. A spec like this would certainly help to build a stable core, that can do even more than that.

We can certainly express everything as a flat array, except for nested selectors. I'd like to keep flattening of nested selectors as an operation that takes nested rules and outputs a flat array, since that'll separate parsing from flattening. If that makes sense...

Edit: Btw, so not to mistunderstand this. Are we creating an AST format for all css-in-js solutions here?

kof commented 7 years ago

We can certainly express everything as a flat array, except for nested selectors. I'd like to keep flattening of nested selectors as an operation that takes nested rules and outputs a flat array, since that'll separate parsing from flattening. If that makes sense...

Can you make an example? It seemed to me that we can flatten everything.

kitten commented 7 years ago

@kof Let's say we have the following in SC:

border: 1px solid red;

&:hover {
  border: 1px solid green;
}

Then we can either express this as:

[
  RULE_START, 1,
    SELECTOR, '&',
    PROPERTY, 'border',
    VALUE, '1px solid red',
    RULE_START, 1,
      SELECTOR, '&:hover',
      PROPERTY, 'border',
      VALUE, '1px solid green',
    RULE_END
  RULE_END
]

(Tuples or not)

Or we actually nest it:

[
  RULE_START, 1,
    [
    SELECTOR, '&',
    PROPERTY, 'border',
    VALUE, '1px solid red',
    [
      RULE_START, 1,
        SELECTOR, '&:hover',
        PROPERTY, 'border',
        VALUE, '1px solid green',
      RULE_END
    ]
  RULE_END
]

I've now realised that both would work equally well, I guess. Sorry 😆

kof commented 7 years ago

Lets see if we can express everything when using tuples of max-length 2 and everything flat, if yes, this constraint will give us perf benefits and simpler parsing evtl.

Actually if we use one array for everything, tuples of max-length 2 will become virtual.

kof commented 7 years ago

At the end, a completely flat format will allow parsers to avoid isArray check to identify the nesting and avoid recursion. We will be able to parse the whole thing in one loop.

kitten commented 7 years ago

@kof theoretically, if we're trying to maximise performance, turning the type identifiers into strings instead of numbers could lead to the array being a homogeneously typed array, which should help V8 optimise the performance of traversing and pushing to it :)

kof commented 7 years ago

@kof theoretically, if we're trying to maximise performance, turning the type identifiers into strings instead of numbers could lead to the array being a homogeneously typed array, which should help V8 optimise the performance of traversing and pushing to it :)

Thats easy to achieve, lets see if we can test that!

kof commented 7 years ago

Ah noo, we want function values support, so it won't be homogeneous anyways.

kitten commented 7 years ago

@kof oh yea, that's true. I totally forgot about that 😆

geelen commented 7 years ago

Hmm, in my earlier suggestion the function values are just indexes into an array of functions later. Those could be strings too, technically, though I dunno if it'd help that much. Personally, I think the RULE_START keys should all by Symbols as well (it'll be nicer when debugging).

IMO this is already looking like a format that we could use internally in SC for our no-parser optimisation. I can only really see static CSS extraction being totally reliable if there are no dynamic values, so for me that's a bit further off. But using this for no-parser could prove out whether it can cover all our use cases.

What do you think, @philpl?

kof commented 7 years ago

Hmm, in my earlier suggestion the function values are just indexes into an array of functions later.

True, that can be done as an optimization if required.

technically, though I dunno if it'd help that much.

Lets find this out, my feeling it will be relatively not much if at all. At the end using strings where we could use numbers will be more expensive, but thats just my noobs gut feeling.

Personally, I think the RULE_START keys should all by Symbols as well

You mean like that?


const ruleStart = Symbol('ruleStart')
const ruleEnd = Symbol('ruleEnd')
[
  [ruleStart],
  [ruleEnd]
]
kof commented 7 years ago

Woo,just added a bench for using one single array and it turns out it is almost twice slower than using lots of smaller arrays. WTF!!! https://esbench.com/bench/592d599e99634800a03483d8

kof commented 7 years ago

Ok I was using statically defined arrays during the setup (outside of test). Now as I started passing them during the test itself, results look more logical. I guess some magical optimization happened before. I think also its more correct to have the array creation + parsing inside of the test timeframe.

kof commented 7 years ago

So on my machine one homogenous array of strings is the fastest. In Chrome its almost 50% faster.

kof commented 7 years ago

Now if we want to optimize the storage, we could simply kick the array and use something as and use something as a separator.

kitten commented 7 years ago

@kof told ya :P it would make a big difference in a larger application.

@geelen I can spike an entire CSS parser based on this spec for the v3 core library that I'll discuss tomorrow with @threepointone. We were already considering moving away from using stylis for it. IMHO it'd be better to just move that effort to that library directly, and stat building on v3. I feel like that'd yield good results more quickly.

I'm unsure whether we should use this structure for the preprocessed CSS. I'll have to test what the most efficient structure in the runtime is.

kof commented 7 years ago

@kof told ya :P it would make a big difference in a larger application.

not sure about the big difference in an application yet, those are micro benchmarks.

kitten commented 7 years ago

@kof I'd expect this to not be the biggest bottleneck in a real word runtime and parser, but the perf benefit would add up as more components are added to an app

kof commented 7 years ago

I can spike an entire CSS parser based on this spec for the v3 core library

Does it make sense to have a very low level generic parser for the spec, which can be used to produce an opinionated structure for a specific library or even regular css?

kof commented 7 years ago

I'm unsure whether we should use this structure for the preprocessed CSS. I'll have to test what the most efficient structure in the runtime is.

I think this structure is all about efficiency of parsing, no?

kitten commented 7 years ago

@kof yea, I'd love to eventually end up with a reference implementation that can be used to build the css flattener, preprocessing, and static css extraction, on top of it. Either way, I'll fledge out some details of where we'll end up with regarding the preprocessor lib tomorrow :smile:

Edit:

I think this structure is all about efficiency of parsing, no?

It's very efficient already indeed, but if the preprocessor runs at build-time, there might be some size benefits of using a more compact structure.*

Edit 2: * at runtime

kof commented 7 years ago

@kof yea, I'd love to eventually end up with a reference implementation that can be used to build the css flattener, preprocessing, and static css extraction, on top of it. Either way, I'll fledge out some details of where we'll end up with regarding the preprocessor lib tomorrow 😄

By "css flattener" you mean taking the user css and compiling it to this format? Not sure we can create any common lib for this, because the source formats are slightly different.

"static css extraction" - we should totally end up with one babel plugin for all the libraries, which can be configured for 2 things: 1. which token to use (e.g. glamorous.h1, styled.h1) 2. What compiler/flattener/lets find a name to use. So that at the end we can rely on one babel plugin to get always the same flat format === unification accomplished.

kitten commented 7 years ago

By css flattener, I mean sth that potentially takes this css last and replaces nested rules with flattened ones. If nested rules will actually be parsed by a reference implementation? If not, it'll probably come down to a separate implementation that outputs this format/ast.

As to the second point: this is exactly what I'll start working on. One library which can preprocess and extract css bits in libraries like SC, glamorous and Glam.

kof commented 7 years ago

By css flattener, I mean sth that potentially takes this css last and replaces nested rules with flattened ones.

We got nesting into this spec already, meaning RULE_START can happen inside of another rule and to reference the parent selector we got PARENT_SELECTOR. Or maybe I don't quite understand what you mean and need an example.

As to the second point: this is exactly what I'll start working on. One library which can preprocessing and extract css bits in libraries like SC, glamorous and Glam.

Sounds great!

kitten commented 7 years ago

@kof that sounds like what I mean :) maybe I'll start with writing a minimal parser and we can think about a reference implementation later ;)

kof commented 7 years ago

Spec updated: removed nesting, array tuples length are now max 2. So now we could actually apply further optimizations, please vote with thumbs up and down to the next comments.

kof commented 7 years ago

Should we go for single array? As consequence we will only be able to express this:

  1. [MARKER]
  2. [MARKER, value]

Right now it seems to work nicely.

[
  [RULE_START, 1],
    [SELECTOR, 'body'],
    [PROPERTY, 'color'],
    [VALUE, 'red'],
  [RULE_END]
]

will become

[
  RULE_START, 1,
    SELECTOR, 'body',
    PROPERTY, 'color',
    VALUE, 'red',
  RULE_END
]
kof commented 7 years ago

Should we go for all string values? Benchmark shows a considerable perf benefit in v8, if it is not flowed: https://esbench.com/bench/592d599e99634800a03483d8

[
  1, 1,
    2, 'body',
    3, 'color',
    4, 'red',
  5
]

will become

[
  '1', '1',
    '2', 'body',
    '3', 'color',
    '4', 'red',
  '5'
]
  1. I am not sure if its worth it from the performance perspective (because only v8 and on micro level)
  2. Tested storage savings, using 100kb probe it shrinks down to 80kb => 20%, not sure if its worth. Probe was [1,1,2,"body",3,"color",4,"red",5] => 1\n1\n\n2\n\nbody\n\n3\n\ncolor\n4\nred\n5
  3. We will need a separate array for fn refs or any other refs
kof commented 7 years ago

Btw. we potentially have not only fn refs, but any other refs, objects, imported primitives, whatever. If static analyzer is not able to completely follow the import which might be also a result of a complex function invocation, we will need to inline the ref in place or push all refs into a separate array. Right now the only reason to maintain separate array is the perf benefit in v8 and 20% storage savings.

kitten commented 7 years ago

@kof the perf will likely not be worse in any other engine due to a homogeneous array. So I'd strongly advise it.

For the refs we can just create a bucket. Probably a WeakMap, where it's available? Since we know the type from the AST, we don't need to separate the map for separate types

kof commented 7 years ago

will likely not be worse

not worse, just no benefit

Probably a WeakMap, where it's available

I don't see how a WeakMap can be useful here.

kitten commented 7 years ago

@kof Yea, actually a normal Map would suffice, and a WeakMap might actually not be working correctly for our needs ^^" It was more of an example. The point is, we can create a bucket and hold all of our referenced values in there

kof commented 7 years ago

The point is, we can create a bucket

Don't know whats a bucket in js. I guess a refs array would do + position as a pointer in the main array.

But still not sure if its worth it vs. just referencing in place.

geelen commented 7 years ago

Yeah I wouldn't chase down perf ideas until we're sure this format can express everything it needs to. I think the idea that everything can be 2-tuples is useful because it's a design constraint that potentially speeds things up, but I'd just use nested, non-homogenous arrays to represent it. That way, if we find something that can't be done easily, we can go to 3-tuples or real references etc etc.

The performance criteria we should be judging this on is what's fastest to download + parse + convert into real CSS. 2-tuples might save us some branching in a tight loop, but it might require more structure as we're processing it. Potentially, there may be no need to deconstruct things into to [[PROPERTY, 'color'], [VALUE, 'red']], we could just store dumb strings like "color: red;". Maybe not, but it'll be much easier to understand the tradeoffs once we have one working implementation.

kof commented 7 years ago

we could just store dumb strings like "color: red;".

I think we need to deconstruct things heavily, otherwise you want be able to for e.g. vendor-prefix props/values upon consumption efficiently.

I was even thinking about deconstructing numeric values and its units.

kof commented 7 years ago

Btw. even in a flatten array we still can have longer than 2 value pairs, we basically can't have dynamic amount of members, but rather a statically predefined for a certain marker.

kof commented 7 years ago

Ok, lets flatten the format and try other optimizations like homogeneous array after we have one reference implementation and see what else is missing, I will create for them separate issues.

kof commented 7 years ago

I think we can merge this PR and create new one for the next things. Thumbs up?