note instructions, including how to report GitHub handles
explain difference between USA classes count and list
distilled www notes and instructions and requests
Scan changes:
dap version number
updated banner scan
?possibly remove all www?
...
Documentation:
change log
list of sites that break without www
part of the POV document
add one column to data dictionary at a time
prepare answers for how to resolve
Ironing:
~Most up to date Canonical Website list transferred to GitHub - doing here and here~ - done
All website in GitHub file in the site scanning data - doing this here.
Note which aren't and why - of 9624 URLs in the omb_idea.csv, 9202 are in the site scanning data. 422 are excluded. Of those, 211 are because they don't end in .gov or .mil. Of the other 211, they are rightly being caught by the filters but I'll try to force them back in.
Note that there's only 9201 in the actual scan data since prod-onrr-frontend.app.cloud.gov is in there twice.
I need to check again in a few days.
All public yes/no values correctly imported
Confirmed
All omb_idea sites in site scanning complete all scans
For each scan type, note which have errors and why
7494/9201 complete - primary
7153/9201 complete - accessibility
7751/9201 complete - performance
7961/9201 complete - robots.txt
7818/9201 complete - sitemap.xml
8176/9201 complete - security
1025/9201 are missing from CISA data; 878 b/c they are .mil sites and 147 for various reasons, including in some cases that CISA doesn't consider the originating agency to be covered by their mandate.
For all omb_idea sites in site scanning that complete, the site is live
Note which aren't and why
Confirm scan dates
confirmed
Confirm all fields populated
Note any potential issues with each field
CMS - only as good as the code library
DAP - note relevant code snippets; note update to which are detected
Third Party Services - only as good as code library;
Due: Aug. 26
note - includes directory prototype
Scan changes:
Documentation:
Ironing:
prod-onrr-frontend.app.cloud.gov
is in there twice.https://github.com/GSA/site-scanning/issues/1029 https://github.com/GSA/site-scanning/issues/1048
Excluded by filters so trying to force them back in:
accounts.ahrq.gov accounts.alcf.anl.gov accounts.cels.anl.gov accounts.lb.csp.noaa.gov accounts.noaa.gov akadev-ion-hhs.gov akaprod-betobaccofree.hhs.gov akaprod-digitalmedia.hhs.gov akaprod-foodsafety.gov akaprod-stopbullying.gov akaqa-ion-hhs.gov alpha.cpars.gov alpha.sam.gov alpha2.sam.gov api-alpha.sam.gov appian-preprod-dsc.fda.gov auth.cdc.gov auth.extranet.niddk.nih.gov auth.launchpad-sbx.nasa.gov auth.ncats.nih.gov auth.ncdc.noaa.gov auth.nih.gov auth.orr.noaa.gov auth.sdcc.bnl.gov auth.tva.gov auth.uspto.gov authdev.nih.gov authproxy.nih.gov authproxydev.nih.gov authstaging.phmsa.dot.gov authtest.ha.nih.gov authtest.nih.gov awslogin-qa.awsprod.nlm.nih.gov cms-www-goesr.woc.noaa.gov d9.qa.jimmycarterlibrary.gov d9.qa.nixonlibrary.gov d9.qa.obamalibrary.gov d9.qa.reaganlibrary.gov datadashboard.preprod.fda.gov eaccounts.pnnl.gov ecc.sit.earthdata.nasa.gov ext-idm.preprod.fda.gov files.asprtracie.hhs.gov files.covid19treatmentguidelines.nih.gov files.healthit.gov files.nccih.nih.gov foiaonline.gov fs-www-avi-lb-pz.sewp.nasa.gov ftp-f5.lanl.gov ftp.cdc.gov ftp.cpc.ncep.noaa.gov ftp.emc.ncep.noaa.gov ftp.gsa.gov ftp.i.ncep.noaa.gov ftp.ncbi.nih.gov ftp.ncep.noaa.gov ftp.nco.ncep.noaa.gov ftp.ngs.noaa.gov ftp.nhc.ncep.noaa.gov ftp.nhc.noaa.gov ftp.nhtsa.dot.gov ftp.nlm.nih.gov ftp.opc.ncep.noaa.gov ftp.phy.ornl.gov ftp.wildfire.gov ftp.wpc.ncep.noaa.gov ftpprd.ncep.noaa.gov ia-content-stage2.phmsa.dot.gov ia-sync-stage2.phmsa.dot.gov icbs-qa.nwcg.gov iciswsstage.epa.gov idn.sit.earthdata.nasa.gov idp.bldc.nwave.noaa.gov idp.boul.nwave.noaa.gov idp.cancer.gov idp.ctc.nwave.noaa.gov idp.int.identitysandbox.gov idp.mww59.identitysandbox.gov idp.nwave.noaa.gov idp.ornl.gov idp.sujana09.identitysandbox.gov idp.vivek.identitysandbox.gov imagery.qa.coast.noaa.gov imagery1.qa.coast.noaa.gov imagery2.qa.coast.noaa.gov in.gov labels.preprod.fda.gov lib-lanl.gov listserv.sos.wa.gov live-www-goesr.woc.noaa.gov maps.qa.coast.noaa.gov maps1.qa.coast.noaa.gov maps2.qa.coast.noaa.gov marinecadastre.qa.coast.noaa.gov mmt.sit.earthdata.nasa.gov mnc-qa.ornl.gov nfsdx.preprod.fda.gov origin-acquisition.gov origin-archive-afsc.fisheries.noaa.gov origin-east-01-drupal-climate.woc.noaa.gov origin-east-01-wordpress-space.woc.noaa.gov origin-east-01-www-ospo.woc.noaa.gov origin-east-02-drupal-climate.woc.noaa.gov origin-east-www-goes.woc.noaa.gov origin-east-www-nhc.woc.noaa.gov origin-east-www-ospo.woc.noaa.gov origin-east-www-ssd.woc.noaa.gov origin-east-www-wpc.woc.noaa.gov origin-exclusions.oig.hhs.gov origin-fisheriespermits.noaa.gov origin-gsa.gov origin-my.uscis.gov origin-qa-api.cbp.dhs.gov origin-seafoodinspection.nmfs.noaa.gov origin-wcatwc.arh.noaa.gov origin-west-www-goes.woc.noaa.gov origin-west-www-nhc.woc.noaa.gov origin-west-www-ospo.woc.noaa.gov origin-west-www-satepsanone.woc.noaa.gov origin-west-www-spc.woc.noaa.gov origin-www-odi.nhtsa.dot.gov preprod.eauth.va.gov preprod.fed.eauth.va.gov preprod.vta.va.gov preview-www-goesr.woc.noaa.gov qa-api.cbp.dhs.gov radar-qa-cp.weather.gov respond.qa.census.gov response-qa.response.epa.gov sa.www4.irs.gov saferfederalworkforce.gov sandbox.bluebutton.cms.gov sandbox.ntp.niehs.nih.gov sbx.tms.va.gov sftp.afsc.noaa.gov sftp1.phmsa.dot.gov shoreline.qa.coast.noaa.gov spotthestation-preprod.hqmce.nasa.gov status.sit.earthdata.nasa.gov stg-foia.gov stormscdn.ngs.noaa.gov tams.preprod.gsa.gov tarp-qa.ornl.gov tes-qa.science.ornl.gov tmss.preprod-acqit.helix.gsa.gov tsdrsec-sit.etc.uspto.gov tsunami.qa.coast.noaa.gov uts-qa-green.nlm.nih.gov uts-qa.nlm.nih.gov vlt-qa.ornl.gov web-qa.ornl.gov webvastage2.er.usgs.gov wfdss-qa.nwcg.gov wri-fot-qa.ornl.gov www-air.larc.nasa.gov www-angler.larc.nasa.gov www-avi-lb-pz.sewp.nasa.gov www-bdnew.fnal.gov www-calipso.larc.nasa.gov www-ccd.lbl.gov www-cdf.lbl.gov www-eng.lbl.gov www-esv.nhtsa.dot.gov www-fars.nhtsa.dot.gov www-fd.bea.gov www-green.lanl.gov www-gte.larc.nasa.gov www-ibt.lbl.gov www-int-ac.cancer.gov www-live.goesr.woc.noaa.gov www-mipl.jpl.nasa.gov www-mslmb.niddk.nih.gov www-nlpir.nist.gov www-nrd.nhtsa.dot.gov www-nsd.lbl.gov www-odi.nhtsa.dot.gov www-origin.usaid.gov www-pm.larc.nasa.gov www-preview.goesr.woc.noaa.gov www-prod-01.oceanexplorer.woc.noaa.gov www-prod-02.oceanexplorer.woc.noaa.gov www-prod.goesr.woc.noaa.gov www-qa.visac.ornl.gov www-scs.lbl.gov www-sdss.fnal.gov www-search-aws.uspto.gov www-search.uspto.gov www-star.fnal.gov www-stken.fnal.gov www-theory.lbl.gov www-web-search-alx.uspto.gov www-web-search-byr.uspto.gov www-x.antd.nist.gov www-xdiv.lanl.gov www1-1-pz.sewp.nasa.gov www1-2-pz.sewp.nasa.gov www2a.cdc.gov www2c.cdc.gov www3.fcc.gov www3.fed.bop.gov www3.nasa.gov www4.eere.energy.gov www4.rcf.bnl.gov www5.eere.energy.gov www5.fdic.gov www6.eere.energy.gov www7.eere.energy.gov www7.phmsa.dot.gov wwwapps.nimh.nih.gov wwwkc.fiscal.treasury.gov wwwn.cdc.gov