csstree / csstree

A tool set for CSS including fast detailed parser, walker, generator and lexer based on W3C specs and browser implementations
https://csstree.github.io/docs/
MIT License
1.85k stars 84 forks source link

Customize tokenizer via the fork API #264

Open scripthunter7 opened 8 months ago

scripthunter7 commented 8 months ago

Closes https://github.com/csstree/csstree/issues/253

This is a relatively simple PR that allows advanced library users to use a custom tokenizer via the fork API. This PR doesn't change how the base library works, it only affects the forks and makes them even more flexible.

Custom tokenizer function can be a completely new tokenizer or a simple wrapper around CSSTree's tokenizer, the point is that it should meet the following requirements:

  1. It should be compatible with the following signature:
    /**
    * CSSTree's tokenizer signature
    *
    * @param source CSS source code to tokenize
    * @param onToken Callback which will be invoked when a token is found
    */
    function tokenize(source: string, onToken: (tokenType: number, startOffset: number, endOffset: number) => void): void;
  2. It should use with similar token IDs as CSSTree does: https://github.com/csstree/csstree/blob/master/lib/tokenizer/types.js (actually, this is quite natural behaviour, since these are tokens defined by the official specs).

Example usage:

import * as cssTree from 'css-tree';

const customTokenize = function(source, onToken) {
    // ...
};

const cssTreeFork = cssTree.fork({
    // Use the customized tokenizer
    tokenize: customTokenize,
});

@lahmatiy I think It would be worth making a documentation about the fork API. If you think so, if I have some free time, I will be happy to help you make a basic one in a different PR. These requirements for the custom tokenizer should also be described there.

coveralls commented 8 months ago

Coverage Status

coverage: 98.869% (+0.003%) from 98.866% when pulling 465988016ac8554bdd4d04e5568c7404943b2e98 on scripthunter7:feature/253 into ba6dfd8bb0e33055c05f13803d04825d98dd2d8d on csstree:master.

lahmatiy commented 8 months ago

@scripthunter7 Thank you for the PR! I'm supportive of this extension. However, there are a few points to address:

  1. If we're introducing a custom tokenizer, it should be consistently used throughout the entire functionality. Currently, it seems to be missing in following modules (maybe in some others): https://github.com/csstree/csstree/blob/ba6dfd8bb0e33055c05f13803d04825d98dd2d8d/lib/lexer/prepare-tokens.js#L33 https://github.com/csstree/csstree/blob/ba6dfd8bb0e33055c05f13803d04825d98dd2d8d/lib/generator/create.js#L27
  2. We would need unit tests to ensure the correct functioning of this feature.

Regarding documentation for the fork API, I absolutely agree. I would appreciate it if you could propose a PR to lay the groundwork. This would also be a good place to detail the requirements for the custom tokenizer, as you've outlined.

scripthunter7 commented 8 months ago

@lahmatiy Thank you for your feedback! I think I made the tokenizer switchable everywhere (I hope). Finally, I introduced a separate utility that returns the tokenizer function from the config object, if it is present. I also made some simple unit tests to make sure that the custom tokenizer is used in the fork, but it does not affect the operation of the default library.