QuickPay-Operational-Performance / Data-and-code

Data and code for econometric analysis
0 stars 0 forks source link

Match firms in usaspending dataset with firms' financials #27

Closed vob2 closed 4 years ago

vob2 commented 4 years ago

For public firms the financial are from COMPUSTAT (all B-schools have access to that through WRDS https://wrds-www.wharton.upenn.edu/) For private firms perhaps Intellect Mergent dataset that CWRU has would work. I am not sure.
Mergent Online (https://www.mergentonline.com/login.php) is another dataset to try. CWRU has that.

We want all standard variables:

Size Natural log of total assets = log(AT). Compustat
Leverage Leverage (Long term debt + Short term debt)/ Total Assets = (DLTT+DLC)/AT Compustat

Leverage Inventory | Inventory | Inventory/Total Assets = INVT/AT | Compustat Inventory Cash and marketable secutirities | Cash and Marketable Securities/ Total Assets = CHE/AT | Compustat Accounts receivable | Accounts Receivable/ Total Assets = RECT/AT | Compustat Accounts payable | Accounts Payable/Total Assets = AP/AT | Compustat R&D | R&D/Assets = XRD/AT | Compustat Tobin's Q | Tobin’s Q = ((CSHO*PRCC)+(DLTT+DLC))/AT | Compustat Operating leverage | SG&A/Assets = XSGA/AT | Compustat Investment grade | Indicator that takes a value of one for investment grade long term issue rating or a value of zero otherwise | Compustat Customer sales % | Percentage of sales contributed by the customer | Compustat Expected default frequency | Calculated based on the distance to default from Bhararth and Shumway (2008) | Compustat Trade credit | Calculated as supplier sales to a customer ∗ supplier accounts receivable/total supplier sales | Compustat

JNing0 commented 4 years ago

We've got a technical problem. Intellect Mergent can only search 500 DUNS numbers a time. We have a total of 1,224,395 DUNS in the data. Is there a way to automate the process so that I don't have to manually search and download 2449 times?

vibhuti6 commented 4 years ago

Hi Jie, one clarification: the data on DUNS that I shared included contract IDs so we can merge the financial information back into our sample. This means that one firm appears multiple times in the data - once for each contract it holds.

I checked and there are 44,786 unique DUNS in the sample. It is still quite large though. One way to deal with this issue can be to split the data into ~100 CSVs. I will try to write a short script for this and get back to you.

JNing0 commented 4 years ago

Thanks. 44,000 makes much more sense than 1.2 million!

Would the script you have in mind will automate the entire process from uploading duns number to downloading the data? If not, I will do it manually. 90 times is not a problem at all.

On Wed, May 27, 2020, 12:28 AM vibhuti6 notifications@github.com wrote:

Hi Jie, one clarification: the data on DUNS that I shared included contract IDs so we can merge the financial information back into our sample. This means that one firm appears multiple times in the data - once for each contract it holds.

I checked and there are 44,786 unique DUNS in the sample. It is still quite large though. One way to deal with this issue can be to split the data into ~100 CSVs. I will try to write a short script for this and get back to you.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/QuickPay-Operational-Performance/Data-and-code/issues/27#issuecomment-634420781, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOBDD5XZO33GVGBIQG34AGDRTSJG5ANCNFSM4NLM27OQ .

vibhuti6 commented 4 years ago

I am thinking of only automating the part for splitting into multiple CSVs. We could also automate the entire process using some web scraping tools, but I would not be able to test or debug that code because your VPN and account details are required to access the data. So I think it is better to go with the first approach. Thanks!

vibhuti6 commented 4 years ago

Hi Jie, I posted the 90 CSV files on Dropbox. Here's the link. And here's the code in case we need to refer back to it. Thanks!

vob2 commented 4 years ago

Great effort. A suggestion. Before doing the full process, how about downloading a subsample of information, to get an idea of overlap between two datasets and the amount of information about companies present. If only 1% of companies we have are represented in Mergent, we do not have to waste time completing the process.

JNing0 commented 4 years ago

@vob2 I did that yesterday for 500 DUNS and Mergent covers the majority of them. A problem is that it does not have the financials for many, which I suspect are the small private ones. But we can get the historical sales data on all companies. That's why I want to download the whole thing, match them, and see.

@vibhuti6 Could you send me the script you'd use to automate the process? What scrapping tool do you use? I will see if I can modify it and debug.

vibhuti6 commented 4 years ago

Hi Jie, I have used Python in the past to obtain data from the USASpending API. So I’m hoping a modified version of that code would work for this purpose. But that data was open access and didn’t require any password or VPN. I will try to alter that code for automating this process and get back to you. Thanks!

On May 27, 2020, at 8:12 AM, JNing0 notifications@github.com wrote:



@vob2https://github.com/vob2 I did that yesterday for 500 DUNS and Mergent covers the majority of them. A problem is that it does not have the financials for many, which I suspect are the small private ones. But we can get the historical sales data on all companies. That's why I want to download the whole thing, match them, and see.

@vibhuti6https://github.com/vibhuti6 Could you send me the script you'd use to automate the process? What scrapping tool do you use? I will see if I can modify it and debug.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/QuickPay-Operational-Performance/Data-and-code/issues/27#issuecomment-634729207, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ALMNL4PSTCTY7BNWU4CQRV3RTUUWJANCNFSM4NLM27OQ.

vibhuti6 commented 4 years ago

Hi Jie, I have posted a pseudo-code for automating this task here. But this code does not account for vpn or account passwords. Also, line 28 on the code assumes the output will only be a csv -- I cannot verify or fix this without looking at the response from the webpage. Hope this helps a bit. Thanks!

JNing0 commented 4 years ago

Here is the data from Mergent Intellect. The website does not have an API so I did it manually. @vibhuti6 I did not use the later files that you created. I used the very first file you posted with all DUNS numbers and split them to 90 files.

vibhuti6 commented 4 years ago

Thanks!

vibhuti6 commented 4 years ago

Hi all, here's a summary of what we have in the Intellect data. It looks like we have the number of employees and sales data for most firms. The rest is quite sparse -- this could still be a sizeable sample when we merge with the USASpending data if these firms hold bulk of the contracts. Code here. Thanks!

number_of_firms = Number of firms for which the respective column information is not null

  number_of_firms
D-U-N-S@ Number 36999
Company Name 36999
Sales Volume 2286
Employee All Sites 0
Employee This Site 0
2010 Sales Volume 28430
2010 Employee All Sites 28430
2010 Employee This Site 28430
2011 Sales Volume 29201
2011 Employee All Sites 29201
2011 Employee This Site 29201
2012 Sales Volume 29802
2012 Employee All Sites 29802
2012 Employee This Site 29802
2013 Sales Volume 30188
2013 Employee All Sites 30188
2013 Employee This Site 30188
2014 Sales Volume 30539
2014 Employee All Sites 30539
2014 Employee This Site 30539
2015 Sales Volume 30812
2015 Employee All Sites 30812
2015 Employee This Site 30812
2016 Sales Volume 31191
2016 Employee All Sites 31191
2016 Employee This Site 31191
2017 Sales Volume 31216
2017 Employee All Sites 31216
2017 Employee This Site 31216
2018 Sales Volume 31039
2018 Employee All Sites 31039
2018 Employee This Site 31039
Year5 Total Assets 367
Year4 Total Assets 379
Year3 Total Assets 382
Year2 Total Assets 382
Year1 Total Assets 382
Year5 Current Assets 363
Year4 Current Assets 374
Year3 Current Assets 377
Year2 Current Assets 377
Year1 Current Assets 377
Year5 Total Liabilities 367
Year4 Total Liabilities 379
Year3 Total Liabilities 382
Year2 Total Liabilities 382
Year1 Total Liabilities 382
Year5 Long Term Debt 259
Year4 Long Term Debt 263
Year3 Long Term Debt 268
Year2 Long Term Debt 272
Year1 Long Term Debt 263
Year5 Current Liabilities 363
Year4 Current Liabilities 375
Year3 Current Liabilities 377
Year2 Current Liabilities 377
Year1 Current Liabilities 377
Year5 Total Revenue 373
Year4 Total Revenue 379
Year3 Total Revenue 382
Year2 Total Revenue 381
Year1 Total Revenue 382
Year5 EBITDA 369
Year4 EBITDA 377
Year3 EBITDA 377
Year2 EBITDA 377
Year1 EBITDA 376
Year5 Operating Income 369
Year4 Operating Income 375
Year3 Operating Income 377
Year2 Operating Income 376
Year1 Operating Income 377
Year5 Net Income 373
Year4 Net Income 381
Year3 Net Income 382
Year2 Net Income 382
Year1 Net Income 382
Year5 Cash from Operations 371
Year4 Cash from Operations 379
Year3 Cash from Operations 379
Year2 Cash from Operations 379
Year1 Cash from Operations 378
Year5 Capital Expenditures 367
Year4 Capital Expenditures 368
Year3 Capital Expenditures 368
Year2 Capital Expenditures 368
Year1 Capital Expenditures 365
Year5 Cash & Cash Equivalents, Beginning of Year 368
Year4 Cash & Cash Equivalents, Beginning of Year 374
Year3 Cash & Cash Equivalents, Beginning of Year 375
Year2 Cash & Cash Equivalents, Beginning of Year 378
Year1 Cash & Cash Equivalents, Beginning of Year 377
Year5 Cash & Cash Equivalents, End of Year 368
Year4 Cash & Cash Equivalents, End of Year 373
Year3 Cash & Cash Equivalents, End of Year 377
Year2 Cash & Cash Equivalents, End of Year 377
Year1 Cash & Cash Equivalents, End of Year 377
Year5 Operating Margin 369
Year4 Operating Margin 375
Year3 Operating Margin 377
Year2 Operating Margin 376
Year1 Operating Margin 377
Year5 Profit Margin (TTM) 373
Year4 Profit Margin (TTM) 379
Year3 Profit Margin (TTM) 382
Year2 Profit Margin (TTM) 381
Year1 Profit Margin (TTM) 382
Year5 Quick Ratio 335
Year4 Quick Ratio 345
Year3 Quick Ratio 347
Year2 Quick Ratio 343
Year1 Quick Ratio 342
Year5 Current Ratio 363
Year4 Current Ratio 374
Year3 Current Ratio 377
Year2 Current Ratio 377
Year1 Current Ratio 377
Year5 Net Current Assets as % of Total Assets 363
Year4 Net Current Assets as % of Total Assets 374
Year3 Net Current Assets as % of Total Assets 377
Year2 Net Current Assets as % of Total Assets 377
Year1 Net Current Assets as % of Total Assets 377
Year5 Free Cash Flow per Share 108
Year4 Free Cash Flow per Share 108
Year3 Free Cash Flow per Share 108
Year2 Free Cash Flow per Share 109
Year1 Free Cash Flow per Share 109
Year5 Revenue to Assets 367
Year4 Revenue to Assets 377
Year3 Revenue to Assets 382
Year2 Revenue to Assets 381
Year1 Revenue to Assets 382
vob2 commented 4 years ago

Thanks, Vibhuti! Yes, it is worth to check the intersection with USA data contracts. But I would guess we lose quite a few contracts from the sample, and probably from the small firms. Empirical research is hard.

vibhuti6 commented 4 years ago

Thanks, Vlad! You are right. I just checked -- there are 9,026 contracts for which we have all of the financial information and only 240 of these are awarded to small businesses.

But there are 810,780 contracts for firms with sales and employee information; 648,994 of which are given to small businesses. At this point, however, it is not entirely clear to me what we can do with the sales and employment information alone. I will think about it.

vob2 commented 4 years ago

This paper: https://github.com/QuickPay-Operational-Performance/Literature/wiki/Brogaard-Denes-Duchin-2020-Political-Influence-and-the-Renegotiation-of-Government-Contracts

matched government award data with Compustat. The process is described in Appendix A. It is semi-automated, but I suspect very time consuming and there are few firms that are left in the sample.

Still, something to keep in mind.

vob2 commented 4 years ago

Does not seem easily doable. Closing for now