fecgov / openFEC

The first RESTful API for the Federal Election Commission. We're aiming to make campaign finance more accessible for journalists, academics, developers, and other transparency seekers.
https://api.open.fec.gov/developers
Other
479 stars 106 forks source link

Discrepancies in Schedule E (missing committees and expenditures) #1927

Closed hobbes3 closed 7 years ago

hobbes3 commented 8 years ago

EDIT: I've simplified the problem.

I've been pulling schedule_e data for the past few weeks for Trump P80001571. Today I decided to start fresh and re-pull all the data:

https://api.open.fec.gov/v1/schedules/schedule_e/?per_page=100&api_key=DEMO_KEY&candidate_id=P80001571 https://api.open.fec.gov/v1/schedules/schedule_e/?last_index=286330550&per_page=100&api_key=DEMO_KEY&candidate_id=P80001571 https://api.open.fec.gov/v1/schedules/schedule_e/?last_index=270475108&per_page=100&api_key=DEMO_KEY&candidate_id=P80001571 https://api.open.fec.gov/v1/schedules/schedule_e/?last_index=264495932&per_page=100&api_key=DEMO_KEY&candidate_id=P80001571 https://api.open.fec.gov/v1/schedules/schedule_e/?last_index=251324099&per_page=100&api_key=DEMO_KEY&candidate_id=P80001571 https://api.open.fec.gov/v1/schedules/schedule_e/?last_index=251283449&per_page=100&api_key=DEMO_KEY&candidate_id=P80001571 https://api.open.fec.gov/v1/schedules/schedule_e/?last_index=249334496&per_page=100&api_key=DEMO_KEY&candidate_id=P80001571 https://api.open.fec.gov/v1/schedules/schedule_e/?last_index=248704885&per_page=100&api_key=DEMO_KEY&candidate_id=P80001571 https://api.open.fec.gov/v1/schedules/schedule_e/?last_index=243276857&per_page=100&api_key=DEMO_KEY&candidate_id=P80001571 https://api.open.fec.gov/v1/schedules/schedule_e/?last_index=235286073&per_page=100&api_key=DEMO_KEY&candidate_id=P80001571

However I immediately noticed Trump's biggest supporter, Great America PAC C00608489 is now missing and doesn't how up in any of the REST endpoints above.

However, if I explicitly look up Great America PAC and Trump then there is data:

https://api.open.fec.gov/v1/schedules/schedule_e/?candidate_id=P80001571&api_key=DEMO_KEY&committee_id=C00608489&per_page=100

What's more interesting is that the website shows Great America PAC spending $7.5 million: https://beta.fec.gov/data/committee/C00608489/?cycle=2016&tab=independent-expenditures-committee

But the old FEC site shows the Great America PAC spending $21.7 million: https://docs.google.com/spreadsheets/d/1rWbqTdMRCvGcxAWzVnZq0NXFA4pdnE9zWxHQHVyg6m0/edit?usp=sharing

Did something changed in the schedule_e API?

Before: http://i.imgur.com/auUABZ3.png http://i.imgur.com/lvetmpi.png After: http://i.imgur.com/roXZozx.png http://i.imgur.com/lvetmpi.png

coreyamarshall commented 8 years ago

This seems like a substantive bug that is dramatically affecting reported results on independent expenditure (and Super-PAC) spending. Any chance someone can take a look?

hobbes3 commented 8 years ago

I've also found out something weird with supplying multiple commitee_id.

Like I said I get data for Great America PAC (committee_id=C00608489) if I search explicitly:

https://api.open.fec.gov/v1/schedules/schedule_e/?per_page=100&api_key=DEMO_KEY&committee_id=C00608489
https://api.open.fec.gov/v1/schedules/schedule_e/?last_index=302688331&per_page=100&api_key=DEMO_KEY&committee_id=C00608489
https://api.open.fec.gov/v1/schedules/schedule_e/?last_index=294192691&per_page=100&api_key=DEMO_KEY&committee_id=C00608489
https://api.open.fec.gov/v1/schedules/schedule_e/?last_index=288222109&per_page=100&api_key=DEMO_KEY&committee_id=C00608489
https://api.open.fec.gov/v1/schedules/schedule_e/?last_index=286471740&per_page=100&api_key=DEMO_KEY&committee_id=C00608489
https://api.open.fec.gov/v1/schedules/schedule_e/?last_index=286099923&per_page=100&api_key=DEMO_KEY&committee_id=C00608489
https://api.open.fec.gov/v1/schedules/schedule_e/?last_index=271647789&per_page=100&api_key=DEMO_KEY&committee_id=C00608489
https://api.open.fec.gov/v1/schedules/schedule_e/?last_index=251320280&per_page=100&api_key=DEMO_KEY&committee_id=C00608489

So I add one more committee and everything is still fine (that I know of):

https://api.open.fec.gov/v1/schedules/schedule_e/?per_page=100&api_key=DEMO_KEY&committee_id=C00401786&committee_id=C00608489
https://api.open.fec.gov/v1/schedules/schedule_e/?last_index=312790484&per_page=100&api_key=DEMO_KEY&committee_id=C00401786&committee_id=C00608489
https://api.open.fec.gov/v1/schedules/schedule_e/?last_index=312790462&per_page=100&api_key=DEMO_KEY&committee_id=C00401786&committee_id=C00608489
https://api.open.fec.gov/v1/schedules/schedule_e/?last_index=312790441&per_page=100&api_key=DEMO_KEY&committee_id=C00401786&committee_id=C00608489
https://api.open.fec.gov/v1/schedules/schedule_e/?last_index=303161433&per_page=100&api_key=DEMO_KEY&committee_id=C00401786&committee_id=C00608489
https://api.open.fec.gov/v1/schedules/schedule_e/?last_index=303161413&per_page=100&api_key=DEMO_KEY&committee_id=C00401786&committee_id=C00608489
https://api.open.fec.gov/v1/schedules/schedule_e/?last_index=292743638&per_page=100&api_key=DEMO_KEY&committee_id=C00401786&committee_id=C00608489
https://api.open.fec.gov/v1/schedules/schedule_e/?last_index=286808361&per_page=100&api_key=DEMO_KEY&committee_id=C00401786&committee_id=C00608489
https://api.open.fec.gov/v1/schedules/schedule_e/?last_index=286395529&per_page=100&api_key=DEMO_KEY&committee_id=C00401786&committee_id=C00608489
https://api.open.fec.gov/v1/schedules/schedule_e/?last_index=272174886&per_page=100&api_key=DEMO_KEY&committee_id=C00401786&committee_id=C00608489
https://api.open.fec.gov/v1/schedules/schedule_e/?last_index=272174595&per_page=100&api_key=DEMO_KEY&committee_id=C00401786&committee_id=C00608489
https://api.open.fec.gov/v1/schedules/schedule_e/?last_index=270475053&per_page=100&api_key=DEMO_KEY&committee_id=C00401786&committee_id=C00608489
https://api.open.fec.gov/v1/schedules/schedule_e/?last_index=250680759&per_page=100&api_key=DEMO_KEY&committee_id=C00401786&committee_id=C00608489
https://api.open.fec.gov/v1/schedules/schedule_e/?last_index=249613059&per_page=100&api_key=DEMO_KEY&committee_id=C00401786&committee_id=C00608489
https://api.open.fec.gov/v1/schedules/schedule_e/?last_index=248463857&per_page=100&api_key=DEMO_KEY&committee_id=C00401786&committee_id=C00608489
https://api.open.fec.gov/v1/schedules/schedule_e/?last_index=244899503&per_page=100&api_key=DEMO_KEY&committee_id=C00401786&committee_id=C00608489
https://api.open.fec.gov/v1/schedules/schedule_e/?last_index=243276831&per_page=100&api_key=DEMO_KEY&committee_id=C00401786&committee_id=C00608489
https://api.open.fec.gov/v1/schedules/schedule_e/?last_index=235286125&per_page=100&api_key=DEMO_KEY&committee_id=C00401786&committee_id=C00608489
https://api.open.fec.gov/v1/schedules/schedule_e/?last_index=3198191&per_page=100&api_key=DEMO_KEY&committee_id=C00401786&committee_id=C00608489
https://api.open.fec.gov/v1/schedules/schedule_e/?last_index=3197908&per_page=100&api_key=DEMO_KEY&committee_id=C00401786&committee_id=C00608489
https://api.open.fec.gov/v1/schedules/schedule_e/?last_index=2940375&per_page=100&api_key=DEMO_KEY&committee_id=C00401786&committee_id=C00608489
https://api.open.fec.gov/v1/schedules/schedule_e/?last_index=2826193&per_page=100&api_key=DEMO_KEY&committee_id=C00401786&committee_id=C00608489

But when I add 3 committees then Great America PAC is nowhere to be found. In fact the number of REST endpoints needed goes down from when I used 2 committees:

https://api.open.fec.gov/v1/schedules/schedule_e/?per_page=100&api_key=DEMO_KEY&committee_id=C00541292&committee_id=C00401786&committee_id=C00608489
https://api.open.fec.gov/v1/schedules/schedule_e/?last_index=264435415&per_page=100&api_key=DEMO_KEY&committee_id=C00541292&committee_id=C00401786&committee_id=C00608489
https://api.open.fec.gov/v1/schedules/schedule_e/?last_index=261269364&per_page=100&api_key=DEMO_KEY&committee_id=C00541292&committee_id=C00401786&committee_id=C00608489
https://api.open.fec.gov/v1/schedules/schedule_e/?last_index=249774342&per_page=100&api_key=DEMO_KEY&committee_id=C00541292&committee_id=C00401786&committee_id=C00608489
https://api.open.fec.gov/v1/schedules/schedule_e/?last_index=248704906&per_page=100&api_key=DEMO_KEY&committee_id=C00541292&committee_id=C00401786&committee_id=C00608489
https://api.open.fec.gov/v1/schedules/schedule_e/?last_index=248463828&per_page=100&api_key=DEMO_KEY&committee_id=C00541292&committee_id=C00401786&committee_id=C00608489
https://api.open.fec.gov/v1/schedules/schedule_e/?last_index=244899489&per_page=100&api_key=DEMO_KEY&committee_id=C00541292&committee_id=C00401786&committee_id=C00608489
https://api.open.fec.gov/v1/schedules/schedule_e/?last_index=243276818&per_page=100&api_key=DEMO_KEY&committee_id=C00541292&committee_id=C00401786&committee_id=C00608489
https://api.open.fec.gov/v1/schedules/schedule_e/?last_index=235286112&per_page=100&api_key=DEMO_KEY&committee_id=C00541292&committee_id=C00401786&committee_id=C00608489
https://api.open.fec.gov/v1/schedules/schedule_e/?last_index=3198124&per_page=100&api_key=DEMO_KEY&committee_id=C00541292&committee_id=C00401786&committee_id=C00608489
https://api.open.fec.gov/v1/schedules/schedule_e/?last_index=3197815&per_page=100&api_key=DEMO_KEY&committee_id=C00541292&committee_id=C00401786&committee_id=C00608489
https://api.open.fec.gov/v1/schedules/schedule_e/?last_index=2927868&per_page=100&api_key=DEMO_KEY&committee_id=C00541292&committee_id=C00401786&committee_id=C00608489
https://api.open.fec.gov/v1/schedules/schedule_e/?last_index=2826193&per_page=100&api_key=DEMO_KEY&committee_id=C00541292&committee_id=C00401786&committee_id=C00608489
LindsayYoung commented 8 years ago

Thanks for reporting this! That is very odd behavior and we will investigate.

LindsayYoung commented 8 years ago

Wanted to confirm this started in the last week or so.

So far:

I am not seeing changes to schedule e or the views where we merge itimized queries with multiple committee_id in the last 2 releases: https://github.com/18F/openFEC/pull/1906/files https://github.com/18F/openFEC/pull/1879/files

We did upgrade our postgres recently, but I don't think this would be related.

I have been printing out queries locally and I am seeing 3 committee id queries and then we do a union on them. That seems right.

I will dig deeper into this after some meetings.

hobbes3 commented 8 years ago

I noticed for committee REBUILDING AMERICA NOW C00618876

https://api.open.fec.gov/v1/schedules/schedule_e/?per_page=100&api_key=DEMO_KEY&committee_id=C00618876

and the response:

Shows 3 expenditures:

Which is very odd for a committee to support and oppose the same candidate. Upon further investigation. For example, one of the transaction for supporting Clinton is (there is a similar one for opposing Clinton too):

{
    "payee_city": "ALEXANDRIA",
    "back_reference_schedule_name": null,
    "independent_sign_date": "2016-08-12",
    "payee_last_name": null,
    "category_code": "004",
    "file_number": 1094506,
    "payee_name": "MULTI MEDIA SERVICES CORPORATION",
    "receipt_date": "2016-08-17",
    "payee_state": "VA",
    "report_year": 2016,
    "candidate_office": "P",
    "notary_sign_name": null,
    "committee_name": null,
    "load_date": "2016-08-17T23:44:55+00:00",
    "support_oppose_indicator": "S",
    "notary_commission_expiration_date": null,
    "committee_id": "C00618876",
    "link_id": 4081720161312290098,
    "office_total_ytd": 11392865.25,
    "filer_prefix": null,
    "notary_sign_date": null,
    "candidate_middle_name": "RODHAM",
    "filer_suffix": "ESQ.",
    "report_primary_general": null,
    "pdf_url": "http:\/\/docquery.fec.gov\/cgi-bin\/fecimg\/?201608179022462824",
    "payee_zip": "22314",
    "expenditure_amount": 1746350.0,
    "candidate_suffix": null,
    "filer_first_name": "RYAN",
    "candidate": {
        "idx": 6544,
        "candidate_id": "P00003392",
        "two_year_period": 2016.0
    },
    "payee_street_2": "2ND FLOOR",
    "payee_middle_name": null,
    "transaction_id": "SE24.213",
    "filer_middle_name": "R.",
    "filing_form": "F24",
    "independent_sign_name": "CALL, RYAN R. ESQ.",
    "committee": {
        "committee_id": "C00618876",
        "party": null,
        "designation_full": "Unauthorized",
        "designation": "U",
        "organization_type_full": null,
        "zip": "22313",
        "city": "ALEXANDRIA",
        "cycles": [2016],
        "expire_date": null,
        "street_1": "PO BOX 26141",
        "treasurer_name": "CALL, RYAN R",
        "street_2": null,
        "name": "REBUILDING AMERICA NOW",
        "state": "VA",
        "committee_type_full": "Super PAC (Independent Expenditure-Only)",
        "candidate_ids": [],
        "organization_type": null,
        "state_full": null,
        "cycle": 2016,
        "committee_type": "O",
        "party_full": null
    },
    "record_number": null,
    "payee_suffix": null,
    "election_type": "G2016",
    "expenditure_description": "MEDIA",
    "cand_office_district": null,
    "election_type_full": null,
    "candidate_first_name": "HILLARY",
    "filer_last_name": "CALL",
    "cand_office_state": null,
    "dissemination_date": "2016-08-16",
    "is_notice": true,
    "category_code_full": null,
    "payee_prefix": null,
    "candidate_prefix": null,
    "report_type": "48",
    "image_number": "201608179022462824",
    "candidate_name": "CLINTON, HILLARY RODHAM",
    "candidate_id": "P00003392",
    "payee_street_1": "915 KING STREET",
    "expenditure_date": "2016-08-12",
    "back_reference_transaction_id": null,
    "update_date": null,
    "payee_first_name": null,
    "candidate_last_name": "CLINTON",
    "line_number": "24",
    "sched_e_sk": 312245971
}

but if you actually open the pdf then it shows the transaction is actually supporting and opposing Clinton for the same amount $1,746,350.

Lastly the FEC page for this committee only shows one expenditure of opposing Clinton for $1,431,503:

https://beta.fec.gov/data/committee/C00618876/?tab=independent-expenditures-committee

coreyamarshall commented 8 years ago

Is there a resolution in sight on this?

LindsayYoung commented 8 years ago

This is on our radar, but we haven't found what the cause of the issue where more committees, yields fewer results.

We did some schema updates that fixed the schedule A index problems after the postgres upgrade. We were hoping that might help with this to, since the timing of the problem matched with when we did the upgrade, but it did not fix it.

While we are working on this, you might want to break up your calls, to get around this bug in the interim. Apologies for the inconvenience, I know the timing is awful.

We are doing some major schedule E updates here - https://github.com/18F/openFEC/pull/1730

Though, this seems to be a join issue in the query and not really stemming from the data, so this might be a separate issue, and we are going to give this issue additional attention.

As far as the data, the FEC data is going to reflect the filings. When I used the IE data before I worked here, I would separate out primary and general expenditures, and use the most recent support oppose indicator for all entries of that election type, it was a bit of a hack but it solved for filer error in my data.

If you think some paper filings have been mis-entered we can flag some people to make corrections so that the data reflects the filings.

Thanks again for writing in and checking on this. We hope to have this resolved as soon as possible.

hobbes3 commented 8 years ago

@LindsayYoung Thanks for your response! I am breaking out my calls currently to just one committee at a time. Do you have any clue on why Great America PAC is not listed under all schedule E on Trump?

As I stated earlier, Great America PAC is one of Trump's major supporters. Currently doing schedule E on Trump reveals a disproportionate amount of expenditure opposing him. Will #1730 fix that?

LindsayYoung commented 8 years ago

@jontours is making great progress with schedule e- It is complicated mix of forms. I think this work will fix the schedule e issues we are seeing, but we will need to confirm. We are dedicating time for working on this today and will get this resolved as quickly as possible.

jontours commented 8 years ago

Opened an issue that explicitly defines the issue here:https://github.com/18F/openFEC/issues/1958 But to summarize the root of the issue, I believe, is how keyset pagination is currently implemented: http://use-the-index-luke.com/no-offset.

Essentially it boils down to once someone walks down enough pages the seek is failing, and it is because of a statement like this: where sub_id < some_sub_id. What's interesting is from what I can tell the table should be ordered so that this doesn't happen because it's ordered like so: ORDER BY exp_dt DESC, sub_id DESC. I'm not sure how postgres is handling that order statement but if I had to guess it's ordering by exp_dt, but as the dates go back further in time the sub_ids will all be larger than some given sub_id to filter on, so the results are being unintentionally filtered out.

Gonna see if I can figure out how to arrange the query such that this doesn't happen.

jontours commented 8 years ago

I haven't been able to test yet since I'm hitting a rate limit, so wait until testing, but the issue here ended up being on how the URL(s) up above are formed, specifically they are missing last_expenditure_date query param in the URL. The swagger documentation states that this is needed to properly page through the itemized (sched_a, sched_b, and sched_e) endpoints. The quote is as follows: To fetch the next page of results, append "&last_index=3023037&last_expenditure_amount=-17348.5" to the URL. As an example. So for the resources above a proper url would be: https://api.open.fec.gov/v1/schedules/schedule_e/?per_page=100&last_index=312790484&last_expenditure_date=2016-07-18&api_key=DEMO_KEY&committee_id=C00401786&committee_id=C00608489

I agree that maybe the documents don't make it entirely clear that last_expenditure_amount MUST be there. So maybe we can reword the documentation. But a good rule to go by is whatever last_ params are in the pagination json map, those must be in the query params to properly paginate through the records. The tricky part of this issue is that it will work for an extent without the expenditure_date param, but after a while the primary keys (sched_e_sk) will by random chance be large enough that it filters out results, as was demonstrated by this issue.

Wait a bit to test, but adjust how you call the endpoint and try again. And we will close the issue.

hobbes3 commented 8 years ago

Thanks for looking into this Jon!

If you take the original REST for Trump in my first message (https://api.open.fec.gov/v1/schedules/schedule_e/?per_page=100&api_key=DEMO_KEY&candidate_id=P80001571) then you get the following line:

{"pagination":{"per_page":100,"count":3344,"last_indexes":{"last_expenditure_date":null,"last_index":286330550},"pages":34},"api_version":"1.0","results": ...

Then if you add last_expenditure_date=null for the next REST call (https://api.open.fec.gov/v1/schedules/schedule_e/?last_index=286330550&per_page=100&api_key=DEMO_KEY&candidate_id=P80001571&last_expenditure_date=null) then you'll get a bad response:

{
  "message": {
    "last_expenditure_date": [
      "Not a valid date."
    ]
  }, 
  "status": 422
}

Is it possible to not sort by expenditure_date at all? I honestly don't care about sorting, I can do that later. I just want a complete result.

LindsayYoung commented 8 years ago

OK, I took a quick look and it seems that expenditure amount is most stable to sort on for now.

https://api.open.fec.gov/v1/schedules/schedule_e/per_page=20&sort=expenditure_amount&api_key=DEMO_KEY

https://api.open.fec.gov/v1/schedules/schedule_e/per_page=20&sort=-expenditure_amount&api_key=DEMO_KEY

I opened an issue to #1960 for us to address the paging through nulls issue.

We might also conciser doing a hard fail if the last index and the last value is not present. I don't want results to be subtlety incomplete.

Thank you again for reporting these issues and working with us to make the API better.

hobbes3 commented 8 years ago

Ah I didn't know you could sort on expenditure_amount. I did notice expenditure_amount is more "stable" since it's all non-null values unlike expenditure_date.

Now the schedule_e on Trump looks like this:

https://api.open.fec.gov/v1/schedules/schedule_e/?sort=expenditure_amount&per_page=100&api_key=DEMO_KEY&candidate_id=P80001571 https://api.open.fec.gov/v1/schedules/schedule_e/?sort=expenditure_amount&last_expenditure_amount=19.53&last_index=279974331&candidate_id=P80001571&per_page=100&api_key=DEMO_KEY https://api.open.fec.gov/v1/schedules/schedule_e/?sort=expenditure_amount&last_expenditure_amount=36.0&last_index=286170947&candidate_id=P80001571&per_page=100&api_key=DEMO_KEY https://api.open.fec.gov/v1/schedules/schedule_e/?sort=expenditure_amount&last_expenditure_amount=50.0&last_index=312790379&candidate_id=P80001571&per_page=100&api_key=DEMO_KEY https://api.open.fec.gov/v1/schedules/schedule_e/?sort=expenditure_amount&last_expenditure_amount=75.0&last_index=312790316&candidate_id=P80001571&per_page=100&api_key=DEMO_KEY https://api.open.fec.gov/v1/schedules/schedule_e/?sort=expenditure_amount&last_expenditure_amount=112.5&last_index=303611009&candidate_id=P80001571&per_page=100&api_key=DEMO_KEY https://api.open.fec.gov/v1/schedules/schedule_e/?sort=expenditure_amount&last_expenditure_amount=150.0&last_index=303611004&candidate_id=P80001571&per_page=100&api_key=DEMO_KEY https://api.open.fec.gov/v1/schedules/schedule_e/?sort=expenditure_amount&last_expenditure_amount=184.86&last_index=241063085&candidate_id=P80001571&per_page=100&api_key=DEMO_KEY https://api.open.fec.gov/v1/schedules/schedule_e/?sort=expenditure_amount&last_expenditure_amount=227.96&last_index=265293580&candidate_id=P80001571&per_page=100&api_key=DEMO_KEY https://api.open.fec.gov/v1/schedules/schedule_e/?sort=expenditure_amount&last_expenditure_amount=270.76&last_index=302688262&candidate_id=P80001571&per_page=100&api_key=DEMO_KEY https://api.open.fec.gov/v1/schedules/schedule_e/?sort=expenditure_amount&last_expenditure_amount=334.05&last_index=294192635&candidate_id=P80001571&per_page=100&api_key=DEMO_KEY https://api.open.fec.gov/v1/schedules/schedule_e/?sort=expenditure_amount&last_expenditure_amount=403.84&last_index=303610919&candidate_id=P80001571&per_page=100&api_key=DEMO_KEY https://api.open.fec.gov/v1/schedules/schedule_e/?sort=expenditure_amount&last_expenditure_amount=500.0&last_index=294192693&candidate_id=P80001571&per_page=100&api_key=DEMO_KEY https://api.open.fec.gov/v1/schedules/schedule_e/?sort=expenditure_amount&last_expenditure_amount=718.75&last_index=244421708&candidate_id=P80001571&per_page=100&api_key=DEMO_KEY https://api.open.fec.gov/v1/schedules/schedule_e/?sort=expenditure_amount&last_expenditure_amount=854.78&last_index=312466849&candidate_id=P80001571&per_page=100&api_key=DEMO_KEY https://api.open.fec.gov/v1/schedules/schedule_e/?sort=expenditure_amount&last_expenditure_amount=1142.15&last_index=312871432&candidate_id=P80001571&per_page=100&api_key=DEMO_KEY https://api.open.fec.gov/v1/schedules/schedule_e/?sort=expenditure_amount&last_expenditure_amount=1869.56&last_index=248704887&candidate_id=P80001571&per_page=100&api_key=DEMO_KEY https://api.open.fec.gov/v1/schedules/schedule_e/?sort=expenditure_amount&last_expenditure_amount=2475.0&last_index=303161551&candidate_id=P80001571&per_page=100&api_key=DEMO_KEY https://api.open.fec.gov/v1/schedules/schedule_e/?sort=expenditure_amount&last_expenditure_amount=3324.49&last_index=286471685&candidate_id=P80001571&per_page=100&api_key=DEMO_KEY https://api.open.fec.gov/v1/schedules/schedule_e/?sort=expenditure_amount&last_expenditure_amount=4717.25&last_index=248704890&candidate_id=P80001571&per_page=100&api_key=DEMO_KEY https://api.open.fec.gov/v1/schedules/schedule_e/?sort=expenditure_amount&last_expenditure_amount=5128.21&last_index=286395466&candidate_id=P80001571&per_page=100&api_key=DEMO_KEY https://api.open.fec.gov/v1/schedules/schedule_e/?sort=expenditure_amount&last_expenditure_amount=7445.0&last_index=261568439&candidate_id=P80001571&per_page=100&api_key=DEMO_KEY https://api.open.fec.gov/v1/schedules/schedule_e/?sort=expenditure_amount&last_expenditure_amount=10000.0&last_index=243276859&candidate_id=P80001571&per_page=100&api_key=DEMO_KEY https://api.open.fec.gov/v1/schedules/schedule_e/?sort=expenditure_amount&last_expenditure_amount=12000.0&last_index=286395456&candidate_id=P80001571&per_page=100&api_key=DEMO_KEY https://api.open.fec.gov/v1/schedules/schedule_e/?sort=expenditure_amount&last_expenditure_amount=15000.0&last_index=303013938&candidate_id=P80001571&per_page=100&api_key=DEMO_KEY https://api.open.fec.gov/v1/schedules/schedule_e/?sort=expenditure_amount&last_expenditure_amount=20100.0&last_index=302715312&candidate_id=P80001571&per_page=100&api_key=DEMO_KEY https://api.open.fec.gov/v1/schedules/schedule_e/?sort=expenditure_amount&last_expenditure_amount=25000.0&last_index=302620592&candidate_id=P80001571&per_page=100&api_key=DEMO_KEY https://api.open.fec.gov/v1/schedules/schedule_e/?sort=expenditure_amount&last_expenditure_amount=33034.37&last_index=270518452&candidate_id=P80001571&per_page=100&api_key=DEMO_KEY https://api.open.fec.gov/v1/schedules/schedule_e/?sort=expenditure_amount&last_expenditure_amount=50000.0&last_index=272811868&candidate_id=P80001571&per_page=100&api_key=DEMO_KEY https://api.open.fec.gov/v1/schedules/schedule_e/?sort=expenditure_amount&last_expenditure_amount=70653.0&last_index=303161408&candidate_id=P80001571&per_page=100&api_key=DEMO_KEY https://api.open.fec.gov/v1/schedules/schedule_e/?sort=expenditure_amount&last_expenditure_amount=100000.0&last_index=271647788&candidate_id=P80001571&per_page=100&api_key=DEMO_KEY https://api.open.fec.gov/v1/schedules/schedule_e/?sort=expenditure_amount&last_expenditure_amount=125000.0&last_index=272798465&candidate_id=P80001571&per_page=100&api_key=DEMO_KEY https://api.open.fec.gov/v1/schedules/schedule_e/?sort=expenditure_amount&last_expenditure_amount=260000.0&last_index=311554725&candidate_id=P80001571&per_page=100&api_key=DEMO_KEY https://api.open.fec.gov/v1/schedules/schedule_e/?sort=expenditure_amount&last_expenditure_amount=962510.0&last_index=265606586&candidate_id=P80001571&per_page=100&api_key=DEMO_KEY https://api.open.fec.gov/v1/schedules/schedule_e/?sort=expenditure_amount&last_expenditure_amount=4679155.65&last_index=312871395&candidate_id=P80001571&per_page=100&api_key=DEMO_KEY

I'm not 100% sure but this data definitely looks more "right". Great America PAC finally shows up in these calls. expenditure_amount is increasing but does it look right that last_index is jumping around?

I still notice 2 problems:

  1. Adding of the sum of expenditure_amount by committee is still different than what's reported on the FEC beta site. For example, I get $10,337,670 for Great America PAC supporting Trump, but the site says $9,945,649: https://beta.fec.gov/data/committee/C00608489/?tab=independent-expenditures-committee. Another is Priorities USA Action spends $48,684,862 but the sites say $45,190,250: https://beta.fec.gov/data/committee/C00495861/?tab=independent-expenditures-committee.
  2. According to the REST calls above, there hasn't been any spending since September 1st on schedule E.
LindsayYoung commented 8 years ago

I think we are all on the right track now. I would not expect the last index to be in order because the order because it is being ordered by amount and the indexes roughly correspond to when the records are processed.

1) The totals should match if you also filter out notices. We filter them out to avoid double counting. You can add the final report and the notices to get the most recent total but the notices should then be supplanted by report when it is filed.

2) I think there is a lag that will be resolved by #1730 it switches the underlying tables to better source data. @jontours has put in great work on this and we are reviewing this. It looks like we will test on dev and staging soon and push to production once we have tested it.

We will keep you appraised of our progress.

Thanks!

hobbes3 commented 8 years ago

Thanks! Adding is_notice=false lowered the numbers and some committees match perfectly like "Our Principles PAC" ($16,353,179.28). Others are short like "Great America PAC" (my sum of $7,772,921 vs site's sum of $9,945,649.83). And others went over like "Planned Parenthood Votes" (for Clinton) (my sum of $2,025,631 vs site's sum of $1,474,458.22)

I will wait for @jontours code to be in production, then review again.

jontours commented 8 years ago

@hobbes3 I tested expenditure amounts for Great America PAC against a pdf provided by the FEC (http://www.fec.gov/press/summaries/2016/tables/ie/IE1_2016_18m.pdf) with committee aggregates from the new endpoint and the totals are very close. I've some concerns with totals from sched_e matching totals within the committee details page. With that being said the totals computed from the new sched_e endpoint will be the ground truth. And they, at least for one committee, appear to agree with aggregates provided by the FEC.

update: actually the new endpoint does also very closely match to what is being reported on the front end by the committee summary. I didn't realize that I needed to explode the total disbursements to find IEs.

coreyamarshall commented 8 years ago

Thanks to everyone for their work on this! any idea when these changes will be in production?

LindsayYoung commented 8 years ago

OK, we just launched the improvements to production. This will eliminate the data lag. I did a quick check and things look right to me. Paging through by amount should give you all the data.

We are still working on the paging through nulls issue.

Please tell us if you have any further issues.

Thanks again for your patience!

hobbes3 commented 8 years ago

So I'm seeing the latest date to be 9-29, which is fairly recent (about a week ago), but is it right that there are barely any independent expenditures spent in the last month?

http://i.imgur.com/Jm4bdbv.png

According to http://docquery.fec.gov/cgi-bin/forms/C00495861/1102933/se, Priorities USA Action spent nearly $7 million just yesterday.

LindsayYoung commented 7 years ago

OK, this is looking good to me. Open a new issue if you notice a lag or are having issues

(I find using the webiste as an easy way to chack the API) Click the 24/48 hr button to see the most recent here: https://beta.fec.gov/data/independent-expenditures/?is_notice=true&max_date=10%2F18%2F2016

Thanks again!

hobbes3 commented 7 years ago

Thanks. Could you explain how to merge 24/48 hour reports (is_notice=true) with non-24/48 hour reports (is_notice=false)?

My understanding is that 24/48 has the latest info but eventually you need to exclude it to avoid double counting?

LindsayYoung commented 7 years ago

There is more than one way to do it, and it can be tricky.

First, in the reports I am interested in, I would look for look for:

IE only committees:

I would create a base total for my committees there. Then, I would look for the 24/48 hour reports that came in after the coverage end date of that committee in the schedule e endpoint and add those amounts to the total.