101companies / 101repo

101companies contributions
http://101companies.org
MIT License
43 stars 31 forks source link

technologies/W3CValidator super-slow #17

Closed rlaemmel closed 12 years ago

rlaemmel commented 12 years ago

This appears to be a performance bug. The validator runs 10-60 secs on some files. This can be easily reproduced by running "make" in "101worker/modules/validate101meta" after "make reset" in said directory. The validations of some .css, .html, .xml files slows down the module A LOT.

rlaemmel commented 12 years ago

Two ideas from today's discussion:

In addition, study "policy" of those online service or contact site owners to see whether our usage scenario is actually sustainable or eventually counts as DOS.

rlaemmel commented 12 years ago

Just a data point. A full validation pass on the fast black42 machines takes about half an hour and produces quite some amount of output. We should try to fight this down.

martinleinberger commented 12 years ago

I defined a timeout on curl calls in W3CValidator. The timeout is currently set to 10sec. This is sufficient for most files to succeed. However, there may be some files where it is not sufficient (for example contributions/pyjamas/output/101Companies.mozilla.cache.html ). But those files probably shouldn't be validated anyway (we talked about this shortly). Also, please bear in mind that this is now dependent on the internet connection - if a connection is slower than mine, then validations might fail because of the 10sec timeout - if it's a faster connection, the timeout value could be even more strict.

martinleinberger commented 12 years ago

I have a question - I'm currently trying to get the offline versions of these validators to work (CSS already works), but I'm wondering if that's really the best thing to do. Because in the Paper, we actually mention remote / online validators and yet I'm currently trying to get rid of them... So, should I keep going and try to replace the remaining online validator with a offline version (if possible) or should (X)HTML still be evaluated online?

martinleinberger commented 12 years ago

All validation is now done offline. There shouldn't be any performance issues anymore.