commonsearch / cosr-back

Backend of Common Search. Analyses webpages and sends them to the index.
https://about.commonsearch.org
Apache License 2.0
123 stars 24 forks source link

Avoid indexing data URIs for images #21 #22

Closed adipasquale closed 8 years ago

adipasquale commented 8 years ago

not sure how careful we should be about performance issues. I avoided regexes, but still used startswith which is not the fastest way to check apparently, but the difference doesn't look too big.

We could be a more precise by validating the whole format (a comma should be present after the data: and there should be some alphanumerical characters)

sylvinus commented 8 years ago

Perfect, thanks a lot!