gcflymoto / LendingClub

Lending Club Data Analysis and Algorithms
15 stars 6 forks source link

csv has moved, format changed #1

Open datapharmer opened 9 years ago

datapharmer commented 9 years ago

LoanStatsNew.csv is no longer at https://www.lendingclub.com/fileDownload.action?file=LoanStatsNew.csv&type=gen

Instead it is hosted as a zip file at https://resources.lendingclub.com/LoanStats3c.csv.zip

It also appears the issued date format has changed, and as such loans are able to be parsed. The new issue_d column contains mmm-YYYY. For example Sep-2014 or Aug-2013.

gcflymoto commented 9 years ago

Thanks for reporting

Clete2 commented 9 years ago

Also the new one only takes this year's into account. For a full picture you'll have to get: https://resources.lendingclub.com/LoanStats3b.csv.zip and https://resources.lendingclub.com/LoanStats3a.csv.zip

Found them here: https://www.lendingclub.com/info/download-data.action

gcflymoto commented 9 years ago

The good news is I added support for multiple stats files and download links from the command line.. Bad news is that there have actually been many changes to the data format and values.. the new release won't be backward compatible with the old files. Also this release will be python only until the kinks are worked out.

gcflymoto commented 9 years ago

uploaded a new version. Please test it out and I'll close the issue

Clete2 commented 9 years ago

Thanks! It seems to be working well, except I think it is trying to parse the header as a loan. No big deal, as it told me that it threw it out. I'll let it run overnight and see how it works.

One question, is there any advantage to using the "secured" version of the LoanStats? It seems to have a lot more data.

Clete2 commented 9 years ago

Bug? It is always saying 0/mo for # of loans. Maybe it really is 0/month.

[iteration 95/4096 22.73 sec/iter] Matched 1050/466287 loans (0/mo.) test at 13.01% APY. 49 loans defaulted (4.00%, $17.87 avg loss) 9.7080% net APY

gcflymoto commented 9 years ago

It's kind of tricky with that header.. There is actually like a comment on the first row.. and the second being the header row Notes offered by Prospectus (https://www.lendingclub.com/info/prospectus.action) "id","member_id","loan_amnt","funded_amnt",...

gcflymoto commented 9 years ago

The 0/mo for #loans happens because of such few loans are found compared to the history line of all the loans. the number of loans is less the number of months since the older notes

gcflymoto commented 9 years ago

I'll look into the "secured" version as well

gcflymoto commented 9 years ago

BTW. I suggest Pypy if you are running with python.. It will be much faster.

Clete2 commented 9 years ago

Using pypy on that line that I pasted. Not sure if it is configured properly though.

I really appreciate all the work you did on this app. I'm curious to see the results after running it for a while.

gcflymoto commented 9 years ago

Ok cool. I looked into the header issue and actually was already skipping the headers.. that bogus row is an actual row.. now printing the row number to give more information where it was found

gcflymoto commented 9 years ago

going to go buy some groceries and make dinner, when I come back will look into the "secured" version

gcflymoto commented 9 years ago

What you mind giving me some leads on what "secured" loanstats is and where I can find it? Not getting many useful hits of the old google.

Clete2 commented 9 years ago

Sorry about that. First, login to your LC account. Then, go back to this page: https://www.lendingclub.com/info/download-data.action

The loan history links are now *_secured.csv.zip and are much bigger.

gcflymoto commented 9 years ago

@Clete2 I tested with the secure CSVs and it works just fine but you will have to download them yourself.. and then run lcbt.py --stats LoanStats3a_securev1.csv.zip,LoanStats3b_securev1.csv.zip,LoanStats3c_securev1.csv.zip

gcflymoto commented 9 years ago

Unless there is anything else I'm closing the ticket.