Open mettke opened 6 years ago
For HTML/XML, I was thinking about using html5ever, but not much optimisations can be performed on those formats unfortunately. For JSON, it'd require a whole new parser (nothing impossible to do but still needs to be done).
I'm currently writing the JSON minifier with READ implementation. Should be done tomorrow.
I'm not sure how html5ever
works, but I know how my minifier works. It is possible to create an iterator like structure which yields 5 or more bytes at a time, allowing to check whether the current one should be filtered or not. That way it is possible to iterator over the whole string/read handling each byte only once removing those who are not supposed to be there.
I will give you an example tomorrow with the JSON parser. It is a bit more difficult but offers a huge performance bonus as it is not iterating multiple times over the whole data
I'm currently writing the JSON minifier with READ implementation. Should be done tomorrow.
Oh cool!
I'm not sure how
html5ever
works, but I know how my minifier works. It is possible to create an iterator like structure which yields 5 or more bytes at a time, allowing to check whether the current one should be filtered or not. That way it is possible to iterator over the whole string/read handling each byte only once removing those who are not supposed to be there.
Not sure yet but since it's used by servo
, I'd assume it's quite efficient.
I will give you an example tomorrow with the JSON parser. It is a bit more difficult but offers a huge performance bonus as it is not iterating multiple times over the whole data
In my current implementation, I'm not iterating multiple times over whole data. Or did I miss your point? :)
In my current implementation, I'm not iterating multiple times over whole data. Or did I miss your point? :)
re.replace_all(&source, " ").into_owned()
re.replace_all(source, |caps: &Captures| { type_reg.replace_all(&caps[0], "").into_owned() }).into_owned()
for useless_tag in &useless_tags { res = res.replace(useless_tag, ""); }
What is your way of calling this :D (no offence)
That's the HTML part. This part is clearly not ready and shouldn't be used. That's why I want to rewrite it using html5ever. I should have been more clear on that point, my bad. :)
Ah I see. Well in that case it might make sense. Remember however that html and xml is not the same. I guess that xml cannot be minified by html5ever
. There are for example data entires (or however those are called) which don't exist in html
XML is simpler, indeed. Don't know if it's really worth it to write a XML minifier though.
@GuillaumeGomez Any updates on this? Would be nice to know whether html5ever
turned out to be handy for XML and HTML minification
HTML minification is tricky, because you never really know where spaces are important or not. For XML I haven't look at all. Also, I still need to add the from_read
for CSS and JS...
Does your "html" checkpoint include javascript (in a script tag or in its own file)? I think that's the most minifiable of them all, and probably with the most existing libs to do it.
I'm adding this Issue to track progress for the integration into #684
My suggestion is:
What do you think? I'm of course happy to help if you point me to the right direction. Maybe some of the stuff from my crate might be helpful as well