HTTPArchive / cwv-tech-report

Core Web Vitals Technology Report
https://cwvtech.report
31 stars 2 forks source link

Wappalyzer technolgies table not showing expected results for Ecommerce #25

Open rockeynebhwani opened 3 years ago

rockeynebhwani commented 3 years ago

Take an example of this site - https://www.maxtondesign.co.uk/. This site uses 'OpenCart' Ecommerce platform as per Wappalyzer chrome extension and also as per BuiltWith (https://builtwith.com/detailed/maxtondesign.co.uk). BuiltWith shows OpenCart from Aug-2018.

When we query technologies table for this site, we don't see Ecommerce category at all. This impacts the stats for Ecommerce chapter.

SELECT *  FROM
    `httparchive.technologies.2021_*`
where url = 'https://www.maxtondesign.co.uk/'
and category = 'Ecommerce'

@pmeenan - Can we please check if Wappalyzer integration is working as expected?

rockeynebhwani commented 3 years ago

@pmeenan - We are also seeing junk values in technologies table in some cases..

image

pmeenan commented 3 years ago

Looks like OpenCart uses the cookies for detection. Looking now to see if those are plumbed and how to add it if they aren't.

pmeenan commented 3 years ago

Created a PR for WebPageTest. Should be in the September crawl but it's too late for August which is just wrapping up.

tunetheweb commented 3 years ago

Thanks @pmeenan !

@rockeynebhwani / @rviscomi I think for the second issue that's a hangover from when Wappalyzer was broken affecting crawls in the beginning of the year (think it was tracked in https://github.com/HTTPArchive/almanac.httparchive.org/issues/1843).

When I run this:

SELECT
  _TABLE_SUFFIX AS run,
  COUNT(0) AS total
FROM
  `httparchive.technologies.*`
WHERE
  _TABLE_SUFFIX > '2021' AND
  app LIKE '%function%'
GROUP BY
  _TABLE_SUFFIX
ORDER BY
  run

We don't see any recent issues:

image

@rviscomi probably should clean up your Web Technologies Report to filter out these "apps" so they don't show in the drop down.

rockeynebhwani commented 3 years ago

@pmeenan / @tunetheweb - I am still seeing some inconsistencies in July table when I compare the results with Wappalyzer extension. Example - https://ezplaytoys.com/

Ran this query and got only 7 results


SELECT
  * 
FROM
  `httparchive.technologies.2021_07_01_mobile`
where url = 'https://ezplaytoys.com/'

Same query for desktop gives 21 results.. I don't expect so much difference between desktop and mobile for this site...

rviscomi commented 3 years ago

Did a bunch of cleanup of obviously bad technology names in the dashboard table (httparchive.core_web_vitals.technologies). For example, names containing only spaces, dots, and/or numbers, and source code like function or this..

image

There are still some odd combinations of valid app names with version numbers appended. I won't touch those since there are lots of them and some may be useful.

image

This doesn't prevent new datasets from adding more junk to the dashboard so we may need to clean it up again in the future or put in place better checks in HA or upstream in WPT.