Open drewbo opened 1 month ago
Here's a cleaned dataset, about 5000 sites
source_
(to make the set smaller and bc its not important where they came from).mil
sitespublic = true
or final_url_live = true
.com
websites final_url_website
, look for target_url_redirects = false
and keep only that record,final_url_status_code
text/html
media type in final_url_media_type
ftp
or admin
in the domainworking notes file:
https://docs.google.com/document/d/1arE0mDjwP6NPY_uOLP5DMwzlUMdaOKtvz87S5u2qmJ4/edit?tab=t.0