Closed dvquy13 closed 4 years ago
H: Hypothesis C: Conclusion I: Idea A: Assumption R: Remark
[x] H1. age takes 5 unique values in range [1,5]. They are probably the bins which abstract the real age such as 18, 25, 34, ...
age
[x] C2. Can ignore zipcodeOr and zipMerchant because they carries no information with only 1 unique value each.
zipcodeOr
zipMerchant
[ ] I3. Behavior at merchant are the richest and most potential source of information -> should focus to extract information.
merchant
R4. category is very skewed towards "es_transportation" (85%). The differentiation power is hard to tell but this feature can complement merchant
category
R5. amount is the only true continuous variable here and there are clear outliers which are very relevant in the context of fraud.
amount
[x] I6. Is there any merchants with disproportionate female customers?
[ ] I7. Use LRFMP features to describe customers
First assessment looking at the features one by one
H: Hypothesis C: Conclusion I: Idea A: Assumption R: Remark
[x] H1.
age
takes 5 unique values in range [1,5]. They are probably the bins which abstract the real age such as 18, 25, 34, ...[x] C2. Can ignore
zipcodeOr
andzipMerchant
because they carries no information with only 1 unique value each.[ ] I3. Behavior at
merchant
are the richest and most potential source of information -> should focus to extract information.R4.
category
is very skewed towards "es_transportation" (85%). The differentiation power is hard to tell but this feature can complementmerchant
R5.
amount
is the only true continuous variable here and there are clear outliers which are very relevant in the context of fraud.[x] I6. Is there any merchants with disproportionate female customers?
[ ] I7. Use LRFMP features to describe customers