Kozea / tinycss

tinycss is a complete yet simple CSS parser for Python.
https://tinycss.readthedocs.io/
Other
88 stars 20 forks source link

tinycss should implement media query error handling #1

Open kennyluck opened 12 years ago

kennyluck commented 12 years ago

css3-mediaqueries says:

Malformed media query. User agents are to handle unexpected tokens encountered while parsing a media query by reading until the end of the media query, while observing the rules for matching pairs of (), [], {}, "", and '', and correctly handling escapes. Media queries with unexpected tokens are represented as "not all".

where the end of the media query is a comma.

From code inspection, tinycss doesn't seem to do this and throws when a non-IDENT is ever encountered.

SimonSapin commented 12 years ago

Actually, tinycss should implement media queries.

tinycss 0.2 only supports CSS 2.1 media types. Media queries are definitely on the TODO list (since they are in the snapshot), although they are low-priority because I don’t use them. (WeasyPrint’s page size is always fixed exactly by the 'size' property, so the use case for media queries is weaker.)

The ParseError exception is an internal flow control mechanism. It means “Ignore this declaration or rule.” It is caught by the higher-level parser and should never be propagated to the user. The exception objects are only visible in the parse_errors list, attribute of Stylesheet objects, but at this point they are only a container for a line, column and message. These “errors” are never fatal, they should probably be called “warnings” instead.

By the time an exception is raised for an invalid media type (or an unsupported media query), the () [] and {} tokens have already been paired in nested groups. (Quoted strings are handled even earlier, in the tokenizer.) Assuming @media, the whole at-rule has already been parsed into an AtRule object. For the stylesheet-level parser, catching a ParseError means ignore this AtRule, log the error/warning, and carry on.

kennyluck commented 12 years ago

The ParseError exception is an internal flow control mechanism. It means “Ignore this declaration or rule.”

Yeah, I sort of learn that. What I am trying to say is that, @import url("abc") +, all;, say, is a CSS 2.1 level input with undefined error handling in CSS 2.1. The whole rule should not be dropped even if this is a CSS 2.1 parser, so an internal ParseError here seems wrong.

Actually, tinycss should implement media queries.

You are of course free to follow your agenda :)

SimonSapin commented 12 years ago

When media queries are not supported, @import is defined in 6.3 of CSS 2.1. The normative prose there matches the informative grammar in Appendix G:

import
  : IMPORT_SYM S*
    [STRING|URI] S* media_list? ';' S*
  ;
media
  : MEDIA_SYM S* media_list '{' S* ruleset* '}' S*
  ;
media_list
  : medium [ COMMA S* medium]*
  ;
medium
  : IDENT S*
  ;

An at-rule that does not match this grammar can not be parsed as an ImportRule by tinycss. (Of course media queries change the grammar, but the principle still stands.)

tinycss just drops ignored tokens because I never need them in WeasyPrint or CairoSVG. This means that it is currently not sufficient to implement a minifier. To fix that it should somehow preserve invalid stuff. Maybe have an object to represent a blob of unparsed tokens, and allow it anywhere. The problem is that eg. WeasyPrint will need to ignore such objects in the middle of a Declaration list, and not try to access their name/value attributes. Or maybe unparsed stuff could be preserved only in an opt-in mode.

Many things are possible. If they are not done it only means that I haven’t needed them yet and nobody yet has told me that they need it.

kennyluck commented 12 years ago

When media queries are not supported, @import is defined in 6.3 of CSS 2.1. The normative prose there matches the informative grammar in Appendix G:

These as far as I can tell don't mention what to do if the input isn't matching the grammar (or can you point out which sentence in 6.3 asks for the behavior tinycss implements?). Ignoring the whole thing is of course a straightforward way to handle it, but using the css3-mediaqueries way will be more browser-matching.

An at-rule that does not match this grammar can not be parsed as an ImportRule by tinycss. (Of course media queries change the grammar, but the principle still stands.)

Does this mean that tinycss has no intention to break this principle and follow the error handling rules as described in css3-mediaqueries? Close this issue as WONTFIXED if you feel like so. For what's worth,

  1. I think tinycss as it is doesn't quite follow Appendix G anyway. It follows the core grammar. They are different.
  2. An input like &test, screen isn't matching the formal grammar in css3-mediaqueriers, yet it is an example in the spec. I've learned to always ignore any formal grammar in a CSS spec, but you are of course feel to believe in any.

Many things are possible. If they are not done it only means that I haven’t needed them yet and nobody yet has told me that they need it.

Just to clarify, I am just spotting this from code inspection. I don't need it (or otherwise I'll just submit a patch). You don't really need to explain why this isn't happening. Feel free to ignore this, and only if you seriously believe that error handling rules as described css3-mediaqueries isn't compatible with a CSS 2.1 parser, we can debate more. (Or if you think implementing this is harmful for a tinycss UA, close this as WONTFIXED)

And also, I believe how malformed Declaration list is handled is more of an orthogonal issue so I think it's quite unhelpful to conflate the two.

For what's worth again, I think fixing is is pretty easy (seems like less than 5 lines in parse_media, just turn a unrecognized thing into "not all" according to css3-mediaqueries). I might submit a patch later but not now.

kennyluck commented 12 years ago

And also, I believe how malformed Declaration list is handled is more of an orthogonal issue so I think it's quite unhelpful to conflate the two.

Oh, I just realized that you did this because you are pointing to it from www-style. shrug

SimonSapin commented 12 years ago

About minifier: yes, I was replying here and on www-style, I mixed them up a little. You’re right the larger issue of whether to drop ignored tokens is not relevant to your original issue.

I just pushed 2793e725ea to make this clearer. @import is parsed as ATKEYWORD S* [ URL | STRING] S* <media> S* ";" where <media> is any list of tokens (and may be empty), with the usual nesting rules to find the end of an at-rule. Only then, this list of tokens is passed to the parse_media method. In the "media types" version of this method, anything other than comma-separated list of IDENTs is a parse error and the rule is ignored. There is nothing comparable to 'not all' with just media types. The output is a list of strings, and default to ['all'] for an empty input.

When media queries are added, they will be as a parser subclass that overrides the parse_media method. I’ll make sure to have the specified error handling then and there. However, having an @media rule with 'not all' is equivalent to dropping the rule for an UA like WeasyPrint, since rules or stylesheets are never serialized.

kennyluck commented 12 years ago

where <media> is any list of tokens (and may be empty)

That does seem to make parse_media more self-contained!

However, having an @media rule with 'not all' is equivalent to dropping the rule for an UA like WeasyPrint, since rules or stylesheets are never serialized.

css3-mediaquries asks a UA to parse, say, &test, print as not all, print (or equivalently you can let the CSS 2.1 parse_mdeia return ["print"] here) and WeasyPrint should still honor such an input, whether it supports Media Queries like (max-width: 1000px) or not, to be more browser-matching.

When media queries are added, they will be as a parser subclass that overrides the parse_media method. I’ll make sure to have the specified error handling then and there.

Feel free to follow your plan. Just that I don't think CSS 2.1 requires tinycss's current behavior nor does it contradict css3-mediaquries. It's just one of the many undefined things in CSS 2.1. If you spot a contradiction here, it should probably get reported to www-style.

kennyluck commented 12 years ago

Oh, I just realized that the best way for a CSS 2.1 UA seems to be following the HTML4 rules as quoted in css3-mediaquries: extract the first token if it's an IDENT.

SimonSapin commented 12 years ago

I can’t find any reference to HTML4’s media descriptors in CSS 2.1. The closest normative text I could find is in 7.3:

@media and @import rules with unknown media types (that are nonetheless valid identifiers) are treated as if the unknown media types are not present. If an @media/@import rule contains a malformed media type (not an identifier) then the statement is invalid.

So on this particular issue I’d say that tinycss should eventually implement media queries, but in the meantime the current error handling is fine.

kennyluck commented 12 years ago

I can’t find any reference to HTML4’s media descriptors in CSS 2.1.

Hmm... yeah, it's a bit weird, but anyway. Gecko implements the HTML4 rules before Media Quries are implemented. From just testing IE9 quriks mode (equivalent to IE5.5 I think), it implements

1.The value is a comma-separated list of entries. For example,

but not

2.Each entry is truncated just before the first character that isn't a US ASCII letter [a-zA-Z](Unicode decimal 65-90, 97-122), digit [0-9](Unicode hex 30-39), or hyphen (45). In the example, this gives:

. In particular, @media &test, all works fine but not @media all and (width)

The closest normative text I could find is in 7.3:

@media and @import rules with unknown media types (that are nonetheless valid identifiers) are treated as if the unknown media types are not present. If an @media/@import rule contains a malformed media type (not an identifier) then the statement is invalid.

Ah, so there is a contradictory statement. In this case, I am going to raise this as an CSS 2.1 issue because I don't think this behavior is the best for a CSS 2.1 UA, nor is it compatible with css3-mediaquries or old implementations. Implementing both 1. and 2. above seems like a better option to me.

SimonSapin commented 12 years ago

I don’t think this is very important. (At least, I don’t see an use case where this matters.) I’ll keep the current behavior for now, and change it if/when CSS 2.1 changes.

kennyluck commented 12 years ago

The use case for 1. is admittedly very weak, unless people do @media print and (width: xxx), print for backwards compatibility, which I believe to be quite unlikely. The use case for 2. is for @media print and (width: xxx) but I am not clear if it's better to treat it as @media print or @media not all. But yeah, they are both quite minor.