excieve / dragnet

Catching the big fish
MIT License
2 stars 1 forks source link

Check outliers on the full dataset #47

Open pro100olga opened 4 years ago

pro100olga commented 4 years ago

@dchaplinsky pls share the algorithm of defining outliers you used

dchaplinsky commented 4 years ago

https://github.com/excieve/dragnet/commit/f1677214de8b4bb205890eb22f4fe954c376301d#diff-0a7976e2cc1844ff00834f1bede8d856

dchaplinsky commented 4 years ago

@pro100olga can you look into it?

pro100olga commented 4 years ago

Checked on 88K dataset as of 23/04/20.

Excluding people from outliers list

1) Change the logic

Now is based on id:

    excl = (~df['id'].isin([
        'nacp_08a63d8b-2db4-4ef0-8b8b-396e0cd9f495',
        'nacp_7762d918-fe93-4285-8703-7fbe18312634',
        'nacp_50a32d11-ebfa-4466-9bde-2f049cb00574']))

Should be changed to user_declarant_id. Namely, exclude declarations, where user_declarant_id is in [54382, 728990]

2) Exclude more people:

Also, exclude from outliers declarations with the following user_declarant_id: 90684, 552845, 675626, 1084920, 1108047