Open gbinal opened 9 months ago
Some notes:
I'm poking into a bit. Looks to be not so straight-forward:
Can't get the asset directly, because it's expecting to be loaded from the page (maybe something related to headers/path)
And, when accessing the GTM tags, it's a structured, but not consistent json object whose encoding is optimized a bit.
Still might be able to grep a GTM-XXXX string out it.
https://github.com/GSA/site-scanning/issues/617 https://github.com/GSA/site-scanning-documentation/blob/main/pages/scan_steps.md (need to update based on what I learn that we did earlier) https://github.com/GSA/site-scanning/issues/585 https://github.com/GSA/site-scanning/issues/494 https://github.com/GSA/site-scanning/issues/504 https://github.com/GSA/site-scanning/issues/616
https://github.com/GSA/site-scanning-engine/blob/main/libs/core-scanner/src/scans/dap.ts
A sidenote - one idea would be to scan for all UA-, G-, GTM-, etc. codes
Okay - so, to better investigate this, I'm now trying to compare against direct DAP data. I got the 10k most popular URLs.
hiv.gov/ finder.healthcare.gov/ tmsearch.uspto.gov/search/search-information es.usembassy.gov/ fr.usembassy.gov/ hiv.gov/ jp.usembassy.gov/ pk.usembassy.gov/ hiv.gov/ ke.usembassy.gov/ hiv.gov/ co.usembassy.gov/ au.usembassy.gov/ jm.usembassy.gov/ airnow.gov/
Or more exactly: https://www.hiv.gov/ https://finder.healthcare.gov/ https://tmsearch.uspto.gov/search/search-information https://es.usembassy.gov/ https://fr.usembassy.gov/ https://www.hiv.gov/ https://jp.usembassy.gov/ https://pk.usembassy.gov/ https://www.hiv.gov/ https://ke.usembassy.gov/ https://www.hiv.gov/ https://co.usembassy.gov/ https://au.usembassy.gov/ https://jm.usembassy.gov/ https://airnow.gov/
Note: why are there dups?
per 2-7-24 email