ecomore2 / pacs

Cleaning and reshaping PACS data from Institut Pasteur du Laos
https://ecomore2.github.io/pacs
0 stars 0 forks source link

Integrate Village corrected to PACS databased #7

Closed OlivierTelle closed 6 years ago

OlivierTelle commented 6 years ago

I corrected name/spelling of Bhans related to individuals tested positive for either NS1 or PCR in order to match this bhan with the Census 2015 ones . The file is located on dropbox in "Raw data - IPL PACS - Village a retrouver - Ecomore 2 onglet- Bhan colomn" ("village" colomn is the one declared by the patient)".

We need to merge this corrected Bhan into ecomore PACS.

Do only add the Bhan colomn when it apprears (if blank only take into account the declared Villages, Jerome is trying to find out the "real" village from manuscript databased)

choisy commented 6 years ago

Comparing PACS and your file/tab/column, I get this:

positive in PACS negative in PACS
village name corrected 1473 329
village name NA 387 163

meaning that

Is there a problem here or is there something I did not understand?

For more information, the IDs of the 387 positive samples that did not have their village name corrected are: 3 5 7 13 27 39 53 61 79 91 99 101 108 113 126 140 141 153 157 162 163 177 195 210 228 261 263 304 305 312 321 338 346 347 353 354 357 358 383 392 397 404 405 406 408 412 436 439 444 452 454 457 500 501 502 505 509 511 512 519 527 529 531 534 537 544 547 555 558 562 574 575 622 623 628 647 650 652 673 679 682 690 691 695 725 727 732 733 741 749 755 756 765 771 773 788 806 810 818 821 827 838 844 858 860 861 864 873 874 884 895 928 974 987 989 1010 1011 1031 1036 1037 1042 1046 1047 1054 1056 1058 1059 1064 1065 1069 1092 1096 1101 1103 1111 1118 1120 1122 1126 1137 1141 1142 1146 1161 1162 1166 1171 1176 1183 1186 1189 1195 1196 1197 1200 1201 1202 1215 1217 1226 1228 1239 1242 1246 1262 1274 1284 1285 1295 1299 1326 1334 1348 1352 1353 1354 1355 1382 1390 1394 1397 1399 1402 1404 1410 1413 1418 1420 1425 1438 1441 1450 1452 1460 1461 1463 1475 1478 1488 1491 1497 1503 1515 1529 1555 1559 1560 1565 1583 1594 1601 1610 1615 1623 1631 1639 1648 1669 1673 1678 1680 1688 1720 1732 1739 1765 1773 1779 1789 1794 1804 1811 1815 1820 1836 1858 1885 1891 1899 1929 1931 1932 1937 1945 1955 1966 2027 2029 2079 2105 2109 2113 2150 2165 2174 2194 2202 2274 2276 2279 2280 2350 2356 2376 2380 2421 2431 2432 2433 2434 2435 2444 2445 2456 2467 2475 2505 2521 2540 2557 2558 2559 2582 2591 2592 2653 2690 2691 2709 2717 2761 2762 2763 2779 2783 2822 2823 2824 2826 2830 2864 2866 2867 2924 2932 2933 2934 2935 2946 2951 2958 2959 3000 3058 3059 3061 3062 3073 3076 3077 3080 3084 3087 3089 3092 3093 3095 3098 3102 3113 3114 3119 3120 3123 3124 3125 3130 3134 3135 3157 3172 3173 3174 3175 3177 3178 3188 3189 3216 3220 3221 3223 3224 3241 3246 3260 3262 3263 3285 3341 3350 3390 3396 3397 3416 3477 3478 3481 3551 3585 3588 3597 3600 3608 3620 3624 3637 3646 3704 3720 3748 3752 3753 3755 3759 3760 3762

And the IDs of the 329 negative samples for which the village name was corrected are: 1 2 4 9 10 11 12 15 17 18 20 21 22 24 30 31 41 42 43 44 45 46 48 50 52 54 57 59 60 63 64 67 68 69 70 73 74 75 76 77 78 80 81 82 83 84 85 86 87 88 89 93 95 97 100 102 105 110 115 117 120 121 122 128 129 361 370 398 399 400 401 402 403 445 466 483 492 605 674 680 684 685 719 730 760 1044 1238 1339 1375 1415 1429 1536 1537 1602 1829 1830 1864 1900 1946 2082 2083 2122 2292 2303 2334 2375 2399 2423 2476 2513 2515 2522 2595 2597 2604 2612 2613 2639 2642 2695 2735 2796 2807 2809 2846 2858 2859 2869 2870 2872 2876 2879 2881 2888 2891 2893 2896 2898 2899 2900 2903 2906 2907 2908 2916 2919 2920 2921 2925 2926 2928 2929 2941 2942 2943 2944 2945 2947 2950 2963 2964 2987 2994 2995 2998 3001 3002 3003 3005 3006 3010 3019 3023 3024 3025 3026 3029 3032 3033 3036 3037 3039 3041 3042 3043 3044 3045 3046 3047 3052 3054 3055 3065 3067 3068 3071 3072 3081 3085 3096 3099 3101 3107 3108 3111 3112 3117 3129 3132 3142 3144 3146 3147 3148 3149 3150 3151 3153 3155 3156 3169 3170 3181 3183 3184 3197 3215 3217 3225 3227 3229 3239 3240 3248 3249 3250 3253 3258 3265 3266 3269 3275 3282 3290 3295 3296 3306 3311 3313 3321 3331 3340 3342 3351 3360 3365 3373 3389 3394 3398 3400 3403 3404 3409 3410 3415 3425 3435 3436 3437 3443 3445 3446 3448 3449 3450 3451 3454 3455 3467 3468 3479 3482 3517 3520 3533 3541 3587 3589 3598 3602 3609 3611 3612 3618 3621 3622 3623 3625 3627 3628 3629 3635 3638 3641 3645 3649 3650 3652 3653 3672 3674 3675 3676 3677 3678 3679 3680 3681 3682 3683 3685 3689 3691 3693 3699 3700 3701 3703

OlivierTelle commented 6 years ago

Indeed, tried to correct all of them -even negative- since some village of patient were facing same issues, so it took 1second to correct (though it could be interesting to include them later). When I couldn t correct village ID of positive patient it could mean: 1- That village was not mentionned at all 2- That village name does not exist in census list (for exemple 36km) 3 - That suffix was not mentionned (phontong is often mentionned but in census we have (phontongchommani, phontong ou phontonsavang)

To conclude: We need to focus on the positive patient for which the patient village is missing.

choisy commented 6 years ago

Thanks for the explanation, very useful. Corrected villages names have been integrated to the cleaned_data/pacs.csv file. All done!