Open mlissner opened 2 years ago
Just got a big data dump purporting to plug these holes. Bill is on it!
Some stats
~2,226 files (with some disclosures split over multiple files) representing ~2,220 disclosures 67 files were identified for new judges. A first pass and review identified 16 files for 10 disclosures that have not yet been imported successfully.
We haven't attempted to import the remaining new judges yet.
Additionally, I'm still reviewing but it seems like the AO incorrectly denied us disclosures in atleast 1 case.
Note to self, this should include updating the coverage page before being closed
Is this done, @flooie ?
I just re-did this to see how the FDO folks are doing. Not great bob!
Here's the code I ran:
from cl.disclosures.models import REPORT_TYPES, FinancialDisclosures
from itertools import groupby
def find_missing(lst):
return sorted(set(range(lst[0], lst[-1])) - set(lst))
fds = (
FinancialDisclosure.objects
.only('person_id', 'year')
.filter(year__gte=2011) # Prior to this we have loads of random stuff
.exclude(report_type=REPORT_TYPES.NOMINATION)
.order_by('person_id', 'year')
)
for key, group in groupby(fds, lambda x: x.person_id):
p = Person.objects.get(pk=key)
missing_years = find_missing(list(i.year for i in group))
missing_str = ', '.join(str(i) for i in missing_years)
if missing_years:
print(f"{p.name_full}|{p.pk}|{missing_str}")
That gives some pretty good output that you can paste into a Google sheet and split into columns using Google's =SPLIT()
formula.
The results are we're still missing about 120 disclosures, as listed in the second tab here:
I'll forward to the FDO, and on the wheel turns.
Bill did a great analysis of this and produced the spreadsheet here:
https://docs.google.com/spreadsheets/d/1-QMHbpKMi0EoGVtFOpUQpd-R4m5-aAJOUgeG-EhMxW8/edit?usp=sharing
Lots of gaps. I'm forwarding this to the AO FDO to review. We'll see what they say and I'll keep track here.