Open pro100olga opened 4 years ago
@pro100olga can you look into it?
Checked on 88K dataset as of 23/04/20.
Excluding people from outliers list
1) Change the logic
Now is based on id:
excl = (~df['id'].isin([
'nacp_08a63d8b-2db4-4ef0-8b8b-396e0cd9f495',
'nacp_7762d918-fe93-4285-8703-7fbe18312634',
'nacp_50a32d11-ebfa-4466-9bde-2f049cb00574']))
Should be changed to user_declarant_id. Namely, exclude declarations, where user_declarant_id is in [54382, 728990]
2) Exclude more people:
Also, exclude from outliers declarations with the following user_declarant_id: 90684, 552845, 675626, 1084920, 1108047
@dchaplinsky pls share the algorithm of defining outliers you used