Closed gbinal closed 1 month ago
(bold are more urgent; not bold less)
1) ~We are applying the executive
branch too broadly with the .gov websites. Instead of the current logic, base domains should be compared against this list and then applied. An example where this is an issue is websites from the uscourts
source list.~
2) ~base domain agency and bureau should be pulled in for websites on the pulse list the same way as the rest (maybe also the .gov domain list if we want to just use the same method across everybody, but not a huge deal)~
3) ~we need to import multiple more .mil datasets - 2020_eot; dotmil_domains; gov_man_22; oira, omb_idea; dap, other websites~
4) ~use the ignore lists that are in use for the .gov domain for the .mil domain, too~
5) ~create snapshots for the .mil files just as much as the .gov ones~
Moved these to:
the main goal is to get .mil sites in the omb_idea file into the .mil index, but the whole process is worth a quick thinking through