MarcusOtter / cookiemonsters

https://cookiemonsters.eu
GNU Affero General Public License v3.0
2 stars 0 forks source link

🐛 Known bugs with banner finders and analysis #6

Closed MarcusOtter closed 1 year ago

MarcusOtter commented 1 year ago

dn.se

✅ Solved does not render the cookie banner nor appear correctly when we render it: This seems very likely to be a userAgent issue, because when I added a mobile user agent it works "properly" (banner is still not detected, but at least it is now rendered and the rest of the page is properly sized)
MarcusOtter commented 1 year ago

aftonbladet.se

✅ Solved Oliver's method does not seem to find this at all for some reason This is an issue with iframes, Oliver fixed it
MarcusOtter commented 1 year ago

needle.gg and youtube.com

✅ Solved will sometimes (or almost always?) time out after 30 seconds. No idea why this happens Fixed by downgrading puppeteer, the underlying chromium version used was not able to load youtube. See https://github.com/puppeteer/puppeteer/issues/10033
MarcusOtter commented 1 year ago

plausible.io Oliver's algorithm identifies the entire page as a cookie banner Marcus' algorithm goes into a random element

bild

MarcusOtter commented 1 year ago

destinationuppsala.se

✅ Solved https://destinationuppsala.se/event/bandyfinalen-2-2/ This was just a problem with my weird script I think, when I turn it off, the oliver algorithm works
oliverpalonkorp commented 1 year ago

blocket.se (Oliver's algorithm) blocket.se has issues with a sub-container that has a background color. It finds this sub-container as ancestor with a background. Because this ancestor has many elements with keywords, it thinks this is the cookie banner, even though there is a higher container that is the actual cookie banner.

Screenshot 2023-04-17 at 17 03 12
MarcusOtter commented 1 year ago

mongodb.com

✅ Solved https://www.mongodb.com/docs/manual/reference/method/db.collection.find/ **This was a VPN issue, it works for Oliver. This is what the comment used to say:** Cookie banner does not show up in our scraper, but it shows up when manually browsing from firefox. Also does not seem to show up with headful browser from puppeteer. Adding a 5s delay does not help. Mongodb seems to set a cookie that makes the banner only visible at the first visit. Still doesn't exactly explain why we can't see it in any screenshot, but it explains not seeing it in the last 3. Further investigation: I don't get it at all in Chrome, getting a bunch of ERR_NAME_NOT_RESOLVED
MarcusOtter commented 1 year ago

steamcommunity.com https://steamcommunity.com/id/LeMorrow/

Oliver's algorithm does not take a screenshot of the cookie banner (but claims to find it). Marcus' does it correctly.

NOTE: Steam needed a 5s delay for me. When this is removed the banner does not render for me anymore.

MarcusOtter commented 1 year ago

steamidfinder.com

✅ Solved https://www.steamidfinder.com/lookup/LeMorrow/ Marcus' algorithm does not take a screenshot of the cookie banner (but claims to find it). Oliver's does it correctly. It was an iframe issue
MarcusOtter commented 1 year ago

steamidfinder.com (again)

✅ Solved https://www.steamidfinder.com/lookup/LeMorrow/ **This was a VPN issue, it works for Oliver. This is what the comment used to say:** This time, the cookie banner never loads in chrome, ever. Even with normal usage, no puppeteer involved. Maybe they changed something? Because clearly this website worked with our analyzers 5 days ago. Now it only works on firefox. Specifically this script does not load in chrome: https://kumo.network-n.com/dist/app.js ![bild](https://user-images.githubusercontent.com/35617441/234722278-cba311a2-b261-48c5-9d10-3613926b7631.png) There's not much we can do about this, but I'm putting it here so we remember that there are some websites that actually have browser differences.
MarcusOtter commented 1 year ago

reddit.com Both Oliver's and Marcus' shows that a cookie banner was found on mobile, but shows the entire page as screenshot.

bild

MarcusOtter commented 1 year ago

stackoverflow.com

✅ Solved Stackoverflow had colons in their class names, which is apparently not valid in `querySelctorAll` (which I found out [on stackoverflow](https://stackoverflow.com/questions/45110893/select-elements-by-attributes-with-colon) ironically). Was easily fixed by escaping specifically the colons. However, there seem to be a lot of rules that can trip this up (see [this](https://mathiasbynens.be/notes/css-escapes)). ~~We should probably use a library for this: https://github.com/mathiasbynens/cssesc so it escapes it for us before we try to query select on strings we got from websites.~~ I used `CSS.escape` every time we read a CSS identifier (class/id) from the website. ![bild](https://user-images.githubusercontent.com/35617441/234742125-03a21aeb-338d-421f-ad6a-1e9df606271f.png)
MarcusOtter commented 1 year ago

youtube.com

Marcus algorithm does not find the banner on mobile because of a tie in z-index, should be an easy fix.

bild

MarcusOtter commented 1 year ago

aftonbladet.se After refactoring, we get this error

image

MarcusOtter commented 1 year ago

facebook.com Facebook has overriden the CSS object, so we need to polyfill the CSS.escape function. Here is a polyfill for CSS.escape

image

MarcusOtter commented 1 year ago

qnister.com

Timeout after 30 seconds with just "qnister.com", but it works with "https://www.qnister.com/" This is because https://qnister.com does not result in a redirect, but http://qnister.com does. So maybe we should have a lower timeout, like 10s, and then retry with http if https fails.

The same thing happens with 11.ai

MarcusOtter commented 1 year ago

https://pagespeed.web.dev/

✅ Solved Some bug in the analysis code ![image](https://user-images.githubusercontent.com/35617441/236343818-ebef25b4-7357-4708-b0f6-a36676a122e3.png)
MarcusOtter commented 1 year ago

tiktok.com

TikTok's cookie banner is inside a shadow root, which our algorithms can not handle. I think we need to look if the element has the element.shadowRoot property and go through that.. Yikes. Maybe there's a better way but this one seems like it could be hard to deal with. It's like iframes but worse.

MarcusOtter commented 1 year ago

twitch.tv

Incorrectly shows that it can be rejected on the first page

image

image

MarcusOtter commented 1 year ago

spela.se

It does not find the reject all button even though it has one. I believe this is because the settings element is different from the cookie banner element.

image

MarcusOtter commented 1 year ago

csn.se

Says that there is a reject all button on the first layer

bild bild

MarcusOtter commented 1 year ago

mybirdbuddy.com

✅ Solved https://mybirdbuddy.com/pages/bb-feeder-privacy-guide It finds the banner but we get this error ![image](https://github.com/MarcusOtter/crumbly-consent/assets/35617441/a3d174af-1857-4074-acb7-9f838d91f113)
MarcusOtter commented 1 year ago

svenskakyrkan.se https://www.svenskakyrkan.se/

Finds the container instead of the banner (because the container has the z-index, and is blocking.

image

Also, the language detection is wrong here. Both website and banner is in Swedish. image

MarcusOtter commented 1 year ago

Closing this in favor of specific issues for each thing