benbalter / gman

A ruby gem to check if the owner of a given email address or website is working for THE MAN (a.k.a verifies government domains).
http://ben.balter.com/gman/
MIT License
164 stars 85 forks source link

State and local domains #23

Closed afeijoo closed 10 years ago

afeijoo commented 10 years ago

See http://govt-urls.usa.gov/tematres/vocab/index.php for a list of about 11K federal, state, county, city, tribal, and territorial US government domains that aren't .gov or .mil.

benbalter commented 10 years ago

@afeijoo that is amazing. I had no idea that existing and would love to get them added. A few questions:

  1. Spot checking, it looks like there are some .edu domains in there. Can you explain what the criteria is for being listed?
  2. Do you have any insight into the specificity of sub domains? For example I know ci.champaign.il.us is a city domain, but if just champaign.il.us were listed, it would include k12.champaign.il.us which might include e.g., students.
  3. Other than paging through the URLs is there a good way to get the data out programmatically?
  4. Any insight into how often the data is updated?

Thanks for the heads up. Such a great resource!

/cc @ErikSArnold per email

afeijoo commented 10 years ago
  1. Cooperative extensions and presidential libraries are included. They account for most of the .edu's you see. There are also a handful of forest services and other local, state, and federal agencies or programs that are on .edu domains.
  2. The subdomains (and folders, too) are to the government entity. We don't include public K-12 schools or institutions of higher education. We also don't include local library, police, or fire departments with separate domains.
  3. Great timing. We're posting a more consumable file in Github within a few days. We'll let you know when it is up.
  4. Currently quarterly on or about the first of January, April, July, and October.
benbalter commented 10 years ago

:metal: That all sounds great. Would love to get the list included.

Gman also run domains through swot, and reject any matches, so adding the list shouldn't affect the scope of what's allowed by the gem.

We're posting a more consumable file in Github within a few days. We'll let you know when it is up.

That would be absolutely amazing if you could. Will try to build some tooling to consume it on a somewhat regular basis.

ErikSArnold commented 10 years ago

Hey Ben. Just uploaded our list here:

https://github.com/GSA-OCSIT/govt-urls

Hope it proves useful. Note: the formatting is out of the box from our current tool at the moment.

Thanks!

Erik

benbalter commented 10 years ago

@ErikSArnold Awesome. Thanks for the heads up. Preparing to merge over in https://github.com/benbalter/gman/pull/26.

Also, as I was merging, noticed a few domains that were on GMan's list, but not on the export:

bouldercounty.org
sfmta.org
sfcta.org
borough.kenai.ak.us
kcmo.org
clevelandmetroparks.com
benbalter commented 10 years ago

You may also want to check out the domains removed in https://github.com/benbalter/gman/commit/65d38e858ee5805f973ff7c937b844642db3bc77, which at least when I tried to access them, the domain would not resolve.

My concern that was by including non-registered domains, someone could simply go and register any non-resolving domain, and then would be considered GMan.valid?.