christophetd / CloudFlair

🔎 Find origin servers of websites behind CloudFlare by using Internet-wide scan data from Censys.
https://blog.christophetd.fr/bypassing-cloudflare-using-internet-wide-scan-data/
2.58k stars 358 forks source link

Do we really need "similarity" library? #84

Closed ewwink closed 6 months ago

ewwink commented 6 months ago

I think we do not need it, why not just capture <title>my site</title> from the html using regex., but I'm maybe wrong.

christophetd commented 6 months ago

similarity allows to compute structural HTML similarity of two webpages, as described in this research: https://github.com/matiskay/html-similarity?tab=readme-ov-file#references

I think it's a much more solid approach than comparing titles

ewwink commented 6 months ago

is there case where original and CF version have different title or original and other website has same title? sorry for my curiosity 😁

christophetd commented 6 months ago

I would expect both to have the same title