18F / pulse

How the federal .gov domain space is doing at best practices and policies.
Other
94 stars 56 forks source link

Enable human curation of DAP results #654

Closed gbinal closed 7 years ago

gbinal commented 7 years ago

There's two situations that the DAP team has brought up:

1) An agency redirects X.gov to Y.gov using an html redirect instead of a proper server-side redirect. Though this is not a best practice, the DAP team would like to remove X.gov from the list of domains in the DAP section of pulse.
2) E.g. America.gov isn't in use at the second level, only at the third level. In other words, www.america.gov isn't in use, but serve.america.gov and several other x.america.gov are and in all of those cases, they have implemented DAP. Unfortunately, it looks like www.america.gov uses an HTML redirect or the like, so it's staying in the results but being marked as a No in the results. Though we'd rather they set up a server-side redirect, in the meantime, we'd still like america.gov to stay in the results but be marked yes.

Here's my thought: we create a YML file with a structure like the below and update the DAP scripts to either remove a domain or change it's status to Yes if indicated by the YML file. Then the DAP team can update that file through pull requests.

Idea for YML file structure:

- agency: Department of Housing and Urban Development
  domains: 
  - domain: homesales.gov
    status: remove
    reason: The domain uses a client-side redirect that isn't detected by the scanners.  
  - domain: disasterhousing.gov
    status: remove
    reason: The domain uses a client-side redirect that isn't detected by the scanners.  
- agency: Department of State
  domains: 
  - domain: america.gov
    status: change-to-yes
    reason: The domain uses a client-side redirect that isn't detected by the scanners, but it redirects to subdomains that implement DAP, so we want to keep the result but change the status from No to Yes.  
gbinal commented 7 years ago

@eric - A question: what do you think the reaction would be to trying to collaborate with https://github.com/dhs-ncats/pshtt to enable an option for detecting meta-redirects?

konklone commented 7 years ago

@gbinal This already exists:

https://github.com/18F/pulse/blob/master/data/ineligible/analytics.yml

Any domain added to that YAML array will be excluded from DAP eligibility during processing. It would be nice to expand it to include a reason or status or other things, but wanted to flag that we acknowledged this need a while back and implemented a feature for it.

@eric - A question: what do you think the reaction would be to trying to collaborate with https://github.com/dhs-ncats/pshtt to enable an option for detecting meta-redirects?

@gbinal My GitHub handle is @konklone. Sorry @eric, feel free to unsubscribe. :)

And I opened an issue about that for pshtt a week or so ago here: https://github.com/dhs-ncats/pshtt/issues/52

I support adding a meta refresh detector, think it's technically feasible, and I think it'd be likely to be accepted if we wanted to add one.

gbinal commented 7 years ago

Thanks for all of these updates (and sorry about that, @ eric!).

I feel fine closing this for now since that yml file gives us most of what we want.