GSA / site-scanning

The central repository for the Site Scanning program
https://digital.gov/site-scanning
12 stars 2 forks source link

investigate possible DAP false negatives #796

Open gbinal opened 9 months ago

gbinal commented 9 months ago

per 2-7-24 email

gbinal commented 9 months ago

Some notes:

I'm poking into a bit. Looks to be not so straight-forward:

Can't get the asset directly, because it's expecting to be loaded from the page (maybe something related to headers/path)

And, when accessing the GTM tags, it's a structured, but not consistent json object whose encoding is optimized a bit.

Still might be able to grep a GTM-XXXX string out it.

gbinal commented 9 months ago

https://github.com/GSA/site-scanning/issues/617 https://github.com/GSA/site-scanning-documentation/blob/main/pages/scan_steps.md (need to update based on what I learn that we did earlier) https://github.com/GSA/site-scanning/issues/585 https://github.com/GSA/site-scanning/issues/494 https://github.com/GSA/site-scanning/issues/504 https://github.com/GSA/site-scanning/issues/616

https://github.com/GSA/site-scanning-engine/blob/main/libs/core-scanner/src/scans/dap.ts

gbinal commented 9 months ago

A sidenote - one idea would be to scan for all UA-, G-, GTM-, etc. codes

gbinal commented 9 months ago

Okay - so, to better investigate this, I'm now trying to compare against direct DAP data. I got the 10k most popular URLs.

hiv.gov/ finder.healthcare.gov/ tmsearch.uspto.gov/search/search-information es.usembassy.gov/ fr.usembassy.gov/ hiv.gov/ jp.usembassy.gov/ pk.usembassy.gov/ hiv.gov/ ke.usembassy.gov/ hiv.gov/ co.usembassy.gov/ au.usembassy.gov/ jm.usembassy.gov/ airnow.gov/

Or more exactly: https://www.hiv.gov/ https://finder.healthcare.gov/ https://tmsearch.uspto.gov/search/search-information https://es.usembassy.gov/ https://fr.usembassy.gov/ https://www.hiv.gov/ https://jp.usembassy.gov/ https://pk.usembassy.gov/ https://www.hiv.gov/ https://ke.usembassy.gov/ https://www.hiv.gov/ https://co.usembassy.gov/ https://au.usembassy.gov/ https://jm.usembassy.gov/ https://airnow.gov/

Note: why are there dups?