Loan level - Pertaining to individual loans and not their monthly performance data. One observation per loan.
Variable - column
core - all loan data sets should have this.
non-core - specific to a company or loan product. I.e. not general. They describe the loan and the client.
Variables
At its core (all loan data sets should have these) the data set has the following loan level variables:
loan_amount
orig_date (orig_month can be determined)
contract_key
term
interest_rate
fpd_date (fpd_month can be determined)
instalment - won't be included in step 1. will do that when we do amortisation (step2)
fees - Included in instalment. We won't bother with fees now in step 1. Can be more of a metadata setup like loannpv. I.e. fees set up by product.
non-core variables:
product_name
credit_type
These we will not do in step 1 (will probably get to them in step 2 when we have to generate PD's)
demographic variables describing client
bureau variables describing client
score
Method
contract_key can be sequential, but must be unique for each contract for the complete set.
For each orig_month generate:
A certain number of loans can be generated per orig_month. Say 1000.
loan_amount can be generated by something like x <- rnorm(10000,mean = 15000, sd = 5000), but note that a minimum loan amount will need to be spesified and filtered. E.g. min loan amount = R2000, then filter out all values generated that are < R2000. Or change all values < R2000 to R2000.
Interest rates can similarly be generated by a normal distribution. Although interest rates are more narrow usually and have product related rules. So there will be a max and min interest rate. Or just specify a small sd.
fpd_date and fpd_month can be set to orig_month. And orig_month can be calculated from orig_date using last_day()
For a start let's say this is for one portfolio so the credit_type and product_name can be only one. E.g. "unsecured" and "Personal loan" respectively.
Definitions
Loan level - Pertaining to individual loans and not their monthly performance data. One observation per loan. Variable - column core - all loan data sets should have this. non-core - specific to a company or loan product. I.e. not general. They describe the loan and the client.
Variables
At its core (all loan data sets should have these) the data set has the following loan level variables:
loannpv
. I.e. fees set up by product.non-core variables:
Method
For each orig_month generate:
A certain number of loans can be generated per orig_month. Say 1000.
loan_amount can be generated by something like
x <- rnorm(10000,mean = 15000, sd = 5000)
, but note that a minimum loan amount will need to be spesified and filtered. E.g. min loan amount = R2000, then filter out all values generated that are < R2000. Or change all values < R2000 to R2000.Interest rates can similarly be generated by a normal distribution. Although interest rates are more narrow usually and have product related rules. So there will be a max and min interest rate. Or just specify a small sd.
fpd_date and fpd_month can be set to orig_month. And orig_month can be calculated from orig_date using
last_day()
For a start let's say this is for one portfolio so the credit_type and product_name can be only one. E.g. "unsecured" and "Personal loan" respectively.