GuillaumeGomez / minifier-rs

Minifier tool/lib for JS/CSS/JSON files
MIT License
86 stars 16 forks source link

Modifications for Rocket integration #20

Open mettke opened 6 years ago

mettke commented 6 years ago

I'm adding this Issue to track progress for the integration into #684

My suggestion is:

What do you think? I'm of course happy to help if you point me to the right direction. Maybe some of the stuff from my crate might be helpful as well

GuillaumeGomez commented 6 years ago

For HTML/XML, I was thinking about using html5ever, but not much optimisations can be performed on those formats unfortunately. For JSON, it'd require a whole new parser (nothing impossible to do but still needs to be done).

mettke commented 6 years ago

I'm currently writing the JSON minifier with READ implementation. Should be done tomorrow.

I'm not sure how html5ever works, but I know how my minifier works. It is possible to create an iterator like structure which yields 5 or more bytes at a time, allowing to check whether the current one should be filtered or not. That way it is possible to iterator over the whole string/read handling each byte only once removing those who are not supposed to be there.

I will give you an example tomorrow with the JSON parser. It is a bit more difficult but offers a huge performance bonus as it is not iterating multiple times over the whole data

GuillaumeGomez commented 6 years ago

I'm currently writing the JSON minifier with READ implementation. Should be done tomorrow.

Oh cool!

I'm not sure how html5ever works, but I know how my minifier works. It is possible to create an iterator like structure which yields 5 or more bytes at a time, allowing to check whether the current one should be filtered or not. That way it is possible to iterator over the whole string/read handling each byte only once removing those who are not supposed to be there.

Not sure yet but since it's used by servo, I'd assume it's quite efficient.

I will give you an example tomorrow with the JSON parser. It is a bit more difficult but offers a huge performance bonus as it is not iterating multiple times over the whole data

In my current implementation, I'm not iterating multiple times over whole data. Or did I miss your point? :)

mettke commented 6 years ago

In my current implementation, I'm not iterating multiple times over whole data. Or did I miss your point? :)

re.replace_all(&source, " ").into_owned()
re.replace_all(source, |caps: &Captures| {
type_reg.replace_all(&caps[0], "").into_owned()
}).into_owned()
for useless_tag in &useless_tags {
res = res.replace(useless_tag, "");
}

What is your way of calling this :D (no offence)

GuillaumeGomez commented 6 years ago

That's the HTML part. This part is clearly not ready and shouldn't be used. That's why I want to rewrite it using html5ever. I should have been more clear on that point, my bad. :)

mettke commented 6 years ago

Ah I see. Well in that case it might make sense. Remember however that html and xml is not the same. I guess that xml cannot be minified by html5ever. There are for example data entires (or however those are called) which don't exist in html

GuillaumeGomez commented 6 years ago

XML is simpler, indeed. Don't know if it's really worth it to write a XML minifier though.

mettke commented 6 years ago

@GuillaumeGomez Any updates on this? Would be nice to know whether html5ever turned out to be handy for XML and HTML minification

GuillaumeGomez commented 6 years ago

HTML minification is tricky, because you never really know where spaces are important or not. For XML I haven't look at all. Also, I still need to add the from_read for CSS and JS...

SuperCuber commented 4 years ago

Does your "html" checkpoint include javascript (in a script tag or in its own file)? I think that's the most minifiable of them all, and probably with the most existing libs to do it.