HTTPArchive / wptagent

Cross-platform WebPageTest agent
Other
1 stars 0 forks source link

Tech detected with wrong categories #26

Closed max-ostapenko closed 1 week ago

max-ostapenko commented 1 week ago

Run the analysis to see if there were many of these "generated" technologies. Found one more - wrong categories (Miscellaneous or SSR used instead of true ones):

Example page: https://www.edustore.at/

~60 technologies impacted:

SELECT
  detections.*,
  rules.* 
FROM (
  SELECT
    technology.technology AS name,
    category,
    ANY_VALUE(page) AS page
  FROM httparchive.crawl_staging.pages,
  UNNEST (technologies) AS technology,
  UNNEST (technology.categories) AS category
  WHERE date = '2024-10-01'
  GROUP BY 1,2
) detections
LEFT JOIN (
  SELECT
    name,
    category
  FROM `httparchive.wappalyzer.apps`,
    UNNEST(categories) AS category ) AS rules
USING(name, category)
WHERE rules.name IS NULL
order by detections.category

Same issue on WPT, but wappalyzer extension shows correctly.

pmeenan commented 1 week ago

Best guess is that something on the page had overridden Array.find in an incompatible way. Changed the category lookup to do it manually instead (not sure if that broken override will cause issues elsewhere).