Open hampelm opened 8 years ago
To import:
A. Data downloads are at https://www.irs.gov/uac/soi-tax-stats-annual-extract-of-tax-exempt-organization-financial-data
B. Get desired file:
wget https://www.irs.gov/pub/irs-soi/12eofinextract990.zip
C. Extract file
unzip 12eofinextract990.zip
D. Generate schema for file:
csvsql -i postgresql py12_990.dat -d ' ' > py12_990.sql
This may give a segmentation fault for large files. It's probably running out of memory; try it on your personal computer if this happens.
E. Create a table using that statement:
psql postgres < py12_990.sql
F. Import the data into postgres:
psql postgres
postgres=# copy py12_990 from '/tmp/py12_990.dat' delimiter ' ' csv header;
Note that 2015 extracts are linked directly as .dat (not zipped). Adjust import by skipping step C.
Revise step D:
csvsql -i postgresql 15eofinextract990pf.dat -d ' ' > py15_990pf.sql
Once imported, rename 2015 990 tables for consistency:
postgres=# ALTER TABLE "15eofinextract990pf" RENAME TO "py15_990pf";
Here are the statements I've come up with to standardize the 2012 data. I think I was overzealous in the schema and we can remove a lot of fields, but I'll think about that later.
To test these, I created a test table (matt
, you should create one for yourself using the schema above).
You'll notice that these seem pretty redundant. Why select from a select statement? That's a quirk of postgres/SQL: if instead we did the much cleaner
insert into matt
(ein, subseccd)
select subseccd as subseccd, ein as ein
from 2012_data
the insert wouldn't save subseccd
to the subseccd
column of matt
. Instead, it'd insert the data in the order of the columns in matt
, no matter the column names. That's a recipe for hard-to-find bugs. This level of indirection means we can write the fields in any order.
Here goes -- you can use these as templates for doing the other tables:
2012 990s
insert into matt
(source, ein, subseccd, filing_type, tax_period,
contributions_and_grants, program_service_revenue,
investment_income, total_revenue, compensation,
fundraising_fees, total_expenses, total_assets, total_liabilities,
net_assets, data)
select
source, ein, subseccd, filing_type, tax_period,
contributions_and_grants, program_service_revenue,
investment_income, total_revenue, compensation,
fundraising_fees, total_expenses, total_assets, total_liabilities,
net_assets, data
from (select
'2012 990' as source,
subseccd,
"EIN" as ein,
'990' as filing_type,
tax_prd as tax_period,
totcntrbgfts as contributions_and_grants,
totprgmrevnue as program_service_revenue,
invstmntinc as investment_income,
totrevenue as total_revenue,
othrsalwages + compnsatncurrofcr as compensation,
profndraising as fundraising_fees,
totfuncexpns as total_expenses,
totassetsend as total_assets,
totliabend as total_liabilities,
totnetassetend as net_assets,
row_to_json(py12_990) as data
from py12_990 ) as stuff
limit 10
PY12 990 EZ
insert into matt
(source, ein, subseccd, filing_type, tax_period,
contributions_and_grants, program_service_revenue,
investment_income, total_revenue,
total_expenses,
total_assets, total_liabilities,
net_assets, data)
select
source, ein, subseccd, filing_type, tax_period,
contributions_and_grants, program_service_revenue,
investment_income, total_revenue,
total_expenses,
total_assets, total_liabilities,
net_assets, data
from (select
'PY12 990 EZ' as source,
subseccd,
ein,
'990ez' as filing_type,
tax_prd as tax_period,
totcntrbs as contributions_and_grants,
prgmservrev as program_service_revenue,
othrinvstinc as investment_income,
-- othrsalwages + compnsatncurrofcr as compensation,
-- profndraising as fundraising_fees,
totrevnue as total_revenue,
totexpns as total_expenses,
totassetsend as total_assets,
totliabend as total_liabilities,
totnetassetsend as net_assets,
row_to_json(py12_990ez) as data
from py12_990ez ) as stuff
limit 10
2012 PF
insert into matt
(
source,
ein,
subseccd,
irs_year,
filing_type,
tax_period,
contributions_and_grants,
compensation,
investment_income,
total_revenue,
total_expenses,
total_assets,
total_liabilities,
grants_paid,
data
)
select
source,
ein,
subseccd,
irs_year,
filing_type,
tax_period,
contributions_and_grants,
compensation,
investment_income,
total_revenue,
total_expenses,
total_assets,
total_liabilities,
grants_paid,
data
from (select
'2012 990PF' as source,
"SUBCD" as subseccd,
"EIN" as ein,
'990pf' as filing_type,
"TAX_PRD" as tax_period,
"TAX_YR" as irs_year,
"GRSCONTRGIFTS" as contributions_and_grants,
-- prgmservrev as program_service_revenue,
"NETINVSTINC" as investment_income,
"COMPOFFICERS" as compensation,
-- profndraising as fundraising_fees,
"TOTRCPTPERBKS" as total_revenue,
"TOTEXPNSPBKS" as total_expenses,
"TOTASSETSEND" as total_assets,
"TOTLIABEND" as total_liabilities,
"CONTRPDPBKS" as grants_paid,
-- totnetassetsend as net_assets,
row_to_json(py12_990pf) as data
from py12_990pf) as stuff
limit 10
schemas for 2013, 2014, and 2015 990: https://gist.github.com/bnchdrff/14413f0bf8253729bd646373d91519a0
2013 990
insert into matt
(source, ein, subseccd, filing_type, tax_period,
contributions_and_grants, program_service_revenue,
investment_income, total_revenue, compensation,
fundraising_fees, total_expenses, total_assets, total_liabilities,
net_assets, data)
select
source, ein, subseccd, filing_type, tax_period,
contributions_and_grants, program_service_revenue,
investment_income, total_revenue, compensation,
fundraising_fees, total_expenses, total_assets, total_liabilities,
net_assets, data
from (select
'2013 990' as source,
subseccd,
"EIN" as ein,
'990' as filing_type,
tax_pd as tax_period,
totcntrbgfts as contributions_and_grants,
totprgmrevnue as program_service_revenue,
invstmntinc as investment_income,
totrevenue as total_revenue,
othrsalwages + compnsatnandothr + compnsatncurrofcr as compensation,
profndraising as fundraising_fees,
totfuncexpns as total_expenses,
totassetsend as total_assets,
totliabend as total_liabilities,
totnetassetend as net_assets,
row_to_json(py13_990) as data
from py13_990 ) as stuff
limit 10
2013 EZ
insert into matt
(source, ein, subseccd, filing_type, tax_period,
contributions_and_grants, program_service_revenue,
investment_income, total_revenue,
total_expenses,
total_assets, total_liabilities,
net_assets, data)
select
source, ein, subseccd, filing_type, tax_period,
contributions_and_grants, program_service_revenue,
investment_income, total_revenue,
total_expenses,
total_assets, total_liabilities,
net_assets, data
from (select
'2013 EZ' as source,
subseccd,
"EIN" as ein,
'990ez' as filing_type,
tax_pd as tax_period,
totcntrbs as contributions_and_grants,
prgmservrev as program_service_revenue,
othrinvstinc as investment_income,
-- othrsalwages + compnsatncurrofcr as compensation,
-- profndraising as fundraising_fees,
totrevnue as total_revenue,
totexpns as total_expenses,
totassetsend as total_assets,
totliabend as total_liabilities,
totnetassetsend as net_assets,
row_to_json("py13_EZ") as data
from "py13_EZ" ) as stuff
limit 10
2013 PF
insert into matt
(
source,
ein,
subseccd,
irs_year,
filing_type,
tax_period,
contributions_and_grants,
compensation,
investment_income,
total_revenue,
total_expenses,
total_assets,
total_liabilities,
grants_paid,
data
)
select
source,
ein,
subseccd,
irs_year,
filing_type,
tax_period,
contributions_and_grants,
compensation,
investment_income,
total_revenue,
total_expenses,
total_assets,
total_liabilities,
grants_paid,
data
from (select
'2013 PF' as source,
"SUBCD" as subseccd,
"EIN" as ein,
'990ez' as filing_type,
"TAX_PRD" as tax_period,
"TAX_YR" as irs_year,
"GRSCONTRGIFTS" as contributions_and_grants,
-- prgmservrev as program_service_revenue,
"NETINVSTINC" as investment_income,
"COMPOFFICERS" as compensation,
-- profndraising as fundraising_fees,
"TOTRCPTPERBKS" as total_revenue,
"TOTEXPNSPBKS" as total_expenses,
"TOTASSETSEND" as total_assets,
"TOTLIABEND" as total_liabilities,
"CONTRPDPBKS" as grants_paid,
-- totnetassetsend as net_assets,
row_to_json(py13_990pf) as data
from py13_990pf) as stuff
limit 10
2014 990
insert into matt
(
source,
ein,
subseccd,
filing_type,
tax_period,
contributions_and_grants,
program_service_revenue,
investment_income,
total_revenue,
compensation,
fundraising_fees,
total_expenses,
total_assets,
total_liabilities,
net_assets,
data
)
select
source,
ein, subseccd, filing_type, tax_period,
contributions_and_grants, program_service_revenue,
investment_income, total_revenue, compensation,
fundraising_fees, total_expenses, total_assets, total_liabilities,
net_assets, data
from (select
'2014 990' as source,
subseccd,
"EIN" as ein,
'990' as filing_type,
tax_prd as tax_period,
totcntrbgfts as contributions_and_grants,
totprgmrevnue as program_service_revenue,
invstmntinc as investment_income,
totrevenue as total_revenue,
othrsalwages + compnsatnandothr + compnsatncurrofcr as compensation,
profndraising as fundraising_fees,
totfuncexpns as total_expenses,
totassetsend as total_assets,
totliabend as total_liabilities,
totnetassetend as net_assets,
row_to_json(py14_990) as data
from py14_990 ) as stuff
limit 10
2014 990ez
insert into matt
(source, ein, subseccd, filing_type, tax_period,
contributions_and_grants, program_service_revenue,
investment_income, total_revenue,
total_expenses,
total_assets, total_liabilities,
net_assets, data)
select
source, ein, subseccd, filing_type, tax_period,
contributions_and_grants, program_service_revenue,
investment_income, total_revenue,
total_expenses,
total_assets, total_liabilities,
net_assets, data
from (select
'2014 990ez' as source,
subseccd,
"EIN" as ein,
'990ez' as filing_type,
taxprd as tax_period,
totcntrbs as contributions_and_grants,
prgmservrev as program_service_revenue,
othrinvstinc as investment_income,
-- othrsalwages + compnsatncurrofcr as compensation,
-- profndraising as fundraising_fees,
totrevnue as total_revenue,
totexpns as total_expenses,
totassetsend as total_assets,
totliabend as total_liabilities,
totnetassetsend as net_assets,
row_to_json("py14_EZ") as data
from "py14_EZ" ) as stuff
limit 10
2014 990pf
insert into matt
(
source,
ein,
subseccd,
irs_year,
filing_type,
tax_period,
contributions_and_grants,
compensation,
investment_income,
total_revenue,
total_expenses,
total_assets,
total_liabilities,
grants_paid,
data
)
select
source,
ein,
subseccd,
irs_year,
filing_type,
tax_period,
contributions_and_grants,
compensation,
investment_income,
total_revenue,
total_expenses,
total_assets,
total_liabilities,
grants_paid,
data
from (select
'2014 990PF' as source,
"SUBCD" as subseccd,
"EIN" as ein,
'990pf' as filing_type,
"TAX_PRD" as tax_period,
"TAX_YR" as irs_year,
"GRSCONTRGIFTS" as contributions_and_grants,
-- prgmservrev as program_service_revenue,
"NETINVSTINC" as investment_income,
"COMPOFFICERS" as compensation,
-- profndraising as fundraising_fees,
"TOTRCPTPERBKS" as total_revenue,
"TOTEXPNSPBKS" as total_expenses,
"TOTASSETSEND" as total_assets,
"TOTLIABEND" as total_liabilities,
"CONTRPDPBKS" as grants_paid,
-- totnetassetsend as net_assets,
row_to_json(py14_990pf) as data
from py14_990pf) as stuff
limit 10
2015 990
insert into matt
(
source,
ein,
subseccd,
filing_type,
tax_period,
contributions_and_grants,
program_service_revenue,
investment_income,
total_revenue,
compensation,
fundraising_fees,
total_expenses,
total_assets,
total_liabilities,
net_assets,
data
)
select
source,
ein, subseccd, filing_type, tax_period,
contributions_and_grants, program_service_revenue,
investment_income, total_revenue, compensation,
fundraising_fees, total_expenses, total_assets, total_liabilities,
net_assets, data
from (select
'2015 990' as source,
subseccd,
"EIN" as ein,
'990' as filing_type,
tax_prd as tax_period,
totcntrbgfts as contributions_and_grants,
totprgmrevnue as program_service_revenue,
invstmntinc as investment_income,
totrevenue as total_revenue,
othrsalwages + compnsatnandothr + compnsatncurrofcr as compensation,
profndraising as fundraising_fees,
totfuncexpns as total_expenses,
totassetsend as total_assets,
totliabend as total_liabilities,
totnetassetend as net_assets,
row_to_json(py15_990) as data
from py15_990 ) as stuff
limit 10
2015 990ez
insert into matt
(source, ein, subseccd, filing_type, tax_period,
contributions_and_grants, program_service_revenue,
investment_income, total_revenue,
total_expenses,
total_assets, total_liabilities,
net_assets, data)
select
source, ein, subseccd, filing_type, tax_period,
contributions_and_grants, program_service_revenue,
investment_income, total_revenue,
total_expenses,
total_assets, total_liabilities,
net_assets, data
from (select
'2015 990 EZ' as source,
subseccd,
"EIN" as ein,
'990ez' as filing_type,
a_tax_prd as tax_period,
totcntrbs as contributions_and_grants,
prgmservrev as program_service_revenue,
othrinvstinc as investment_income,
-- othrsalwages + compnsatncurrofcr as compensation,
-- profndraising as fundraising_fees,
totrevnue as total_revenue,
totexpns as total_expenses,
totassetsend as total_assets,
totliabend as total_liabilities,
totnetassetsend as net_assets,
row_to_json("py15_990ez") as data
from "py15_990ez" ) as stuff
limit 10
2015 990pf
insert into matt
(
source,
ein,
subseccd,
irs_year,
filing_type,
tax_period,
contributions_and_grants,
compensation,
investment_income,
total_revenue,
total_expenses,
total_assets,
total_liabilities,
grants_paid,
data
)
select
source,
ein,
subseccd,
irs_year,
filing_type,
tax_period,
contributions_and_grants,
compensation,
investment_income,
total_revenue,
total_expenses,
total_assets,
total_liabilities,
grants_paid,
data
from (select
'2015 990 pf' as source,
"SUBCD" as subseccd,
"EIN" as ein,
'990pf' as filing_type,
"TAX_PRD" as tax_period,
"TAX_YR" as irs_year,
"GRSCONTRGIFTS" as contributions_and_grants,
-- prgmservrev as program_service_revenue,
"NETINVSTINC" as investment_income,
"COMPOFFICERS" as compensation,
-- profndraising as fundraising_fees,
"TOTRCPTPERBKS" as total_revenue,
"TOTEXPNSPBKS" as total_expenses,
"TOTASSETSEND" as total_assets,
"TOTLIABEND" as total_liabilities,
"CONTRPDPBKS" as grants_paid,
-- totnetassetsend as net_assets,
row_to_json(py15_990pf) as data
from py15_990pf) as stuff
limit 10
Note that the real inserts should go into the combined
table or another official table. matt
is just a scratchpad for my testing, so don't rely on it to have good data. Use the schema above to create your own table for testing imports.
Hi Hampelm. I know nothing about how this works. I read about it in Friedman's book Thank you for Being Late. There is something here I know but how to mine it, I don't. I am an Enrolled Agent for the IRS and I help people resolve tax issues. I am looking for a data base that tells me who the IRS has sent Tax audits, tax notices, and tax liens and etc. The faster I know this the faster I can help people avoid overspending on high priced professionals. It's digitized somewhere, I need an expert to dig it out on a program like GitHub. Any comments? Thank you Jerry
This statement creates the unified 990s table: