Closed benbalter closed 8 years ago
Thanks, @benbalter. You're right that some differences are intentional. For example, we include military academies, cooperative extensions, and a few other .edu's. We'll work our way through the rest of your list and update our records as needed.
@benbalter Several of the missing domains redirect to new domains. You can find the new, "preferred" domains by querying our Non-.gov URLs API. For example, http://govt-urls.api.usa.gov/government_urls/search?q=sgch.org shows the new domain is sangabrielcity.com.
Chatting with @ErikSArnold a few weeks back, I mentioned that I have a script to validate the domains listed here before I vendor them into GMan.
I rely on the data, so I created a quick script to reconcile the two lists, in hopes of contributing some of those upstream improvements back. You can see the full output below.
To note, I suspect some of the differences may be intentional. I purposely want to exclude educational domains or commercial hosting services, and look only at domains, not sub paths if two government entities share a server.
Glad to answer any questions, and hope the information helps.