Sigil-Ebook / Sigil

Sigil is a multi-platform EPUB ebook editor
GNU General Public License v3.0
5.9k stars 574 forks source link

CSS reformat problem #328

Closed BeckyDTP closed 5 years ago

BeckyDTP commented 6 years ago

Hello,

I know you are preparing version 0.9.9, but maybe you can take a look at it.

One line CSS file with media query. Reformat CSS and a strange result.

Here you can clearly see it.

kovidgoyal commented 5 years ago

I'm done working on it, all the previously discussed items have been implemented. Feel free to add new fixes as you see fit and when you are done, we can discuss releasing it on pypi.

kevinhendricks commented 5 years ago

Wow, that's fast! So are we going with css-parser or css_parser and if the latter should we change the project name too?

As for gumbo, I think we will stay separate as you may well want to have gumbo build lxml style nodes directly, whereas much of our code uses gumbo nodes directly in our C++ and only build lxml style trees for use in plugins.

That said, I will sync in most of your gumbo changes to date into our sigil-gumbo and add in a nesting level limit (which will require an addition to the options structure).

kovidgoyal commented 5 years ago

I prefer keeping the project name as css-parser and the python import name as css_parser (since you cannot use hyphens in import names). This is how html5-parser works as well (import name of html5_parser).

Yeah I might someday have gumbo build lxml nodes directly for an even greater performance improvement.

wrCisco commented 5 years ago

@kevinhendricks Sure, I can work with some bug fixes, and I'll try to keep up with the pace.

kevinhendricks commented 5 years ago

@wrCisco The invite should be out there now. Thanks for helping out!

kevinhendricks commented 5 years ago

Once we add in the other fixes to our css_parser, we will add it to the builtin plugin/pythonlib library in Sigil and as a first step implement the reformat function in CSSInfo to use the new css_parser to actually fix this issue so we can close it.

We can also reimplement the unused class selector removal code as well.

wrCisco commented 5 years ago

I can add the fixes to the problems I stumbled upon these years (the color function case sensitivity matching, some issues with the serializer). About the lack of support for the amazon media types in media queries, in my plugin for css cleaning I just added the appropriate types to the MediaQuery.MEDIA_TYPES list, but I think that the best solution for the parser would be to accept unknown types (since it is clearly foreseen by the spec: https://www.w3.org/TR/css3-mediaqueries/#error-handling). Hoping that the solution won't be too complex, I can work on it tonight and tomorrow.

kevinhendricks commented 5 years ago

One approach might be to add a new Sequence to the Choice where you check for NOT in MEDIA_TYPES but still meets the type of IDENT. That said, I would still add the Amazon media types to MEDIA_TYPES just to be safe.

wrCisco commented 5 years ago

I just pushed all the updates in a new branch called "serializer". Changes in summary are:

If you want, we can add 'amzn-mobi' and 'amzn-kf8' to MEDIA_TYPES, but that can be as easily accomplished at runtime in Sigil or Calibre (at least, I think).

All tests pass both with py2 and py3 and the parser seems to work fine with some real-world ebooks's stylesheets.

kevinhendricks commented 5 years ago

Wonderful! Please go ahead and merge your branch into the main tree. Also please add in those amazon media types since most users will not understand why or how to add those media-types easily. Given we are an ebooks-utils group, support for ebook media-types should be already in place.

FYI, Sigil is going to embed css-parser to replace the need for cssutils completely and then used our embedded python3 interface to call routines to reformat the css direct from the Misc/CSSInfo.cpp class.

Thanks

On Dec 31, 2018, at 11:18 AM, Francesco Martini notifications@github.com wrote:

I just pushed all the updates in a new branch called "serializer". Changes in summary are:

two more prefs for the serializer: linesAfterRules (adding blank lines between rules in css) and formatUnknownAtRules (defaults to True for backward compatibility, but letting the parser to format something it doesn't understand is always a risk). The serializer should now also comply with the preference indentClosingBrace even with at-rules other than at-media. Functions rgb/a and hsl/a now have a case insensitive match. MediaQuery now accepts unknown media types and just logs a warning when (and if) sets the mediaType attribute (I follow Kevin's suggestion to add a new Sequence to the productions used for parsing). In order to test the warning for the media queries I moved/added some logging facilities from test_errorhandler.py to basetest.py (_setHandler and _captureLog) If you want, we can add 'amzn-mobi' and 'amzn-kf8' to MEDIA_TYPES, but that can be as easily accomplished at runtime in Sigil or Calibre (at least, I think).

All tests pass both with py2 and py3 and the parser seems to work fine with some real-world ebooks's stylesheets.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

wrCisco commented 5 years ago

Done, and done.

dougmassay commented 5 years ago

Once we add in the other fixes to our css_parser, we will add it to the builtin plugin/pythonlib library in Sigil and as a first step implement the reformat function in CSSInfo to use the new css_parser to actually fix this issue so we can close it.

The python3lib directory isn't easily accessible to plugins. I propose we make it so for now by adding it to sys.path via launcher.py (or wrapper.py). I can make sure that any plugin that uses cssutils will automatically import this new module (provide the api is still compatible) via launcher.py as well. This will ensure that anyone using an external Python interpreter for Sigil plugins will still be able to easily take advantage of this new, integral module (think Linux, mostly).

In the future, if we publish the module on PyPI (and I think we probably should), we could drop it from Sigil's source package entirely and make it a third-party requirement (like lxml currently is). Doing so would make it available to our Embedded Python routines as well as Sigil plugins (both in the bundled Python and external Python versions).

kevinhendricks commented 5 years ago

I was thinking of treating it like sigil_bs4 for and putting css_parser in the plugin directory where it can be found by EmbeddedPython code if need be. We can do the hooks if need be to make it look like cssutils in the launcher or simply allow both to co-exist. And I agree, once css_parser finally reaches PyPI, we can remove it from Sigil source and replace cssutils as a required package. If we do that should be good to go right? Or am I missing something?

dougmassay commented 5 years ago

You're not missing anything. I am! I'd forgotten all about how we handled the sigil_bs4 module. That should work just fine. We can intercept cssutils and have it import css_parser (or let it coexist as you mentioned) and it will still be available to external plugin interpreters as well. That covers all bases in my opinion.

kovidgoyal commented 5 years ago

@wrCisco looks good to me as well. I have opened an uissue on css-parsers issue tracker about publishing on PyPI https://github.com/ebook-utils/css-parser/issues/1

Can I suggest that we move any future discussions about css-parser to its own issue tracker.

kevinhendricks commented 5 years ago

We have embeded css_parser inside Sigil and have developed a very preliminary css reformatter in python that is run in Sigil's embedded Python environment.

Initial testing indicates it works. Much more work will need to be done to replace most of what Misc/CSSInfo.cpp does with equivalents from css_parser

All of this was pushed to master. So I am finally closing this issue. Thanks to everyone for all their work, feedback, and help. Sorry it took so long to get this closed.