GSA / site-scanning

The central repository for the Site Scanning program
https://digital.gov/site-scanning
11 stars 2 forks source link

Some missing data #977

Closed mgifford closed 1 month ago

mgifford commented 1 month ago

I created this last week:

https://docs.google.com/spreadsheets/d/1CsXAzCzghYYwXzGCcrJqrsWpr5f7MbID2Qw6vQvi3sQ/edit#gid=497600811

In it:
https://www.va.gov/jackson-health-care/
 

jackson.va.govva.govgovTRUEhttps://www.va.gov/jackson-health-care/va.govgovwww.va.govTRUE200text/htmlTRUEFALSEDepartment of Veterans AffairsOffice of Information and Technology, IT Operations and Services (ITOPS)ExecutiveFALSE["pulse"] 2024-05-04T01:13:47.255ZcompletedcompletedcompletedcompletedcompletedcompletedcompletedcompletedTRUE    FALSE TRUE{"agency":"VA"}["dap.digitalgov.gov","prod-va-gov-assets.s3-us-gov-west-1.amazonaws.com","resource.digital.voice.va.gov","s3-us-gov-west-1.amazonaws.com","www.google-analytics.com","www.googletagmanager.com"]6[".va.gov","www.va.gov"]TRUE0.04940820573471.2["about"] TRUETRUEVA Jackson Health Care | Veterans AffairsAt G.V. (Sonny) Montgomery VA Medical Center, our health care teams are deeply experienced and guided by the needs of Veterans, their families, and caregivers. Find a health facility near you, and manage your health online. Sign up for community events and updates. VA Jackson health care | Veterans AffairsAt G.V. (Sonny) Montgomery VA Medical Center, our health care teams are deeply experienced and guided by the needs of Veterans, their families, and caregivers. Find a health facility near you, and manage your health online. Sign up for community events and updates.  /img/styles/3_2_medium_thumbnail/public/2021-04/G.V.%20Sonny%20Montgomery%20Department%20of%20Veterans%20Affairs%20Medical%20Center.jpgwebsitehttps://www.va.gov/jackson-health-care/https://www.va.gov/jackson-health-care/en TRUEFALSETRUEhttps://www.va.gov/jackson-health-care/TRUE200text/html   FALSETRUEhttps://www.va.gov/jackson-health-care/TRUE200text/html   00002500 025  

Is reported not to have a link to an accessibility statement, but it is there.

The scraper just reports ["about"] being there.

Is this accurate, or is there a problem with the script?

gbinal commented 1 month ago

Hi @mgifford. Thanks for reaching out!!!

We also noticed this happening across a number of va.gov sites and what we realized is the footer on these sites is built in a somewhat less common way. Our required links scans looks for standard hyperlinks (a la <a href="URL">text</a>), but the footers on this site are built with something they call 'live JSON injection'. The result is a false negative.

We've done one pass at trying to think about an elegant solution, but didn't have any luck yet, so this remains an open issue but we've also highlighted it with the va.gov team and they may actually shift to a bit more traditional model of coding their footers.

I'm glad you're diving into our data. Feel free to reach out to site-scanning@gsa.gov if you ever want to talk about the data more or if you have more feedback on how our data could be improved!