Open arichard-info opened 4 weeks ago
Hello @arichard-info to be more precise, here is the scenario played by ecoindex: https://www.ecoindex.fr/en/how-it-works/#analysis-methodology
Have you tried to run the complete scenario ?
Hello @vvatelot, thank you for the answer. Yes it's the same scenario I played.
By the way, the official ecoindex.fr scenario can't run in full on the site I gave as an example, nor on many other sites because of the cookie banners that often block scrolling until they've been accepted or declined.
In the case of my scenario, I add a step with playwright to accept third-party cookies in the banner. This way I can execute the rest of the scenario: scroll down and wait three seconds. So I'd expect more nodes than via the ecoindex.fr site. But as I've explained, I'm far from it.
I can't explain why I'm getting so many nodes on the homepage of the site I gave as an example (https://www.kiabi.com).
Whether using a playwright or pupeteer scenario, or even manually in the browser, I always get a much lower number of nodes than that returned by eco-index.fr.
I made tests with headless mode activated and deactivated.
Mode | Node count |
---|---|
Headless | ~3500 |
Headfull | ~700 |
By default, ecoindex is running in headless mode. I don't know if I can make it work in headfull mode in a container...
But, in the end, I don't know why kiabi websites has such a difference. I exported the 2 har files if you want to investigate further: https://gist.github.com/vvatelot/12d8470de4ff83d586408f0225e6424b
What happened?
I think there is a mismatch with the nodes count metric and I can't find why.
When analyzing
https://www.kiabi.com
on ecoindex.fr, I have ~3500 elements. That sounds like a lot, and I can't reach that count when I compare them on my own.For example, if I try to count all the elements from the console, I only get 790 nodes (~1700 if I scroll down) :
I understand that the official script uses playwright : https://github.com/cnumr/ecoindex_python_fullstack/blob/main/components/ecoindex/scraper/scrap.py#L133
So I created a very basic playwright script that counts the elements of a page :
I'm using exactly the same syntax as the ecoindex script
Result :
I don't understand why I'm so far away from the ecoindex.fr results. And in both cases I don't subtract the svg elements
Am I missing something or is there a problem with the playwright used by ecoindex.fr?
Project
Ecoindex Scraper
What OS do you use?
Mac
urls
No response
Relevant log output
No response
Code of Conduct