SwissDataScienceCenter / mlschema-model-converters

Apache License 2.0
3 stars 0 forks source link

models with random states should have their random states saved. #4

Closed chrisbarber closed 4 years ago

chrisbarber commented 4 years ago

this is currently already done by saving random_state of a method, but in case None being used, i.e. a seed from the system is used then unfortunately that seed is not saved.

unfortunately this might not be an easy thing to do as getting access to that seed might not be possible...

chrisbarber commented 4 years ago

This is under discussion here / here. Reproducibility is mentioned but there are many other issues prompting the possible redesign.

At first glance it seems like for our case we would want to vote for Option B / idempotency described here. I haven't read SLEP011 yet so I'm not sure how it differs, nor is it immediately evident which other proposals in the comments would be favorable to reproducibility.

In the short term the best course of action might be to just disallow or warn for random_state=None in renku (to_mls or something could just look at the random state and throw or warn if it is None).

I'll keep reading up and try to understand which of their proposals positively/negatively effect our use-case and chime in if there is anything compelling.

chrisbarber commented 4 years ago

i read their SLEP011 and it is favorable to determinacy/reproducibility. they would just save the state in __init__ of the model to self.random_state, and create a new internal RNG off that state, so no shared RNG. the random state is a big giant tuple (about 2.5Kb). or they might just save a seed (an int).

i have tested and the tuple serializes fine with the current to_mls, it would look like this:

{'value': ['MT19937', [472769727, 4236820826, 2382407561, 3374222267, 1592939088, 3296576824, 2528229848, 2724713781, 2276075160, 848393283, 1188337130, 1219185166, 3865928312, 570784279, 3649329050, 4234477025, 923412047, 268462085, 3465449927, 4002776560, 2002082807, 3910980622, 4072166470, 4231990957, 1455683110, 1479239854, 2176891369, 1067221422, 53626608, 264544257, 3999453291, 4202774419, 411578394, 26579143, 742422174, 4084288468, 3506209886, 2291977817, 1097875790, 3180580374, 2216387881, 587134636, 3471946904, 139480610, 2607763117, 2830411249, 3905847431, 21831199, 2844220610, 4291138880, 8757403, 3013469999, 3868736493, 753159329, 1551065818, 1065527056, 1561677863, 1222917686, 3445134833, 2296465107, 1860077852, 2006984421, 3649509337, 673365552, 1787180564, 3972316097, 3793793705, 72819749, 812463823, 3678197934, 2053768148, 1134449179, 4127851320, 4028289992, 1984950897, 1115283979, 254458088, 411786318, 3395136148, 1022936189, 2529450582, 3058302796, 236193162, 2208182828, 740357762, 1284998789, 3500585816, 2916357702, 4019473265, 187521704, 78550919, 134799085, 681585333, 1154208782, 3031506031, 1861363511, 2511486711, 2934197430, 4260279783, 2680714349, 3616818486, 1583499376, 1889201267, 1162129345, 2284158656, 3233286122, 60732704, 2293973961, 2048370690, 3399639783, 499513996, 2668034691, 161856311, 4113304233, 3735825509, 406434699, 583088564, 608104827, 40972303, 905847227, 1369223033, 2583902726, 1889115440, 710735214, 1370366572, 1604388324, 1634631728, 2944969549, 3887660285, 3332752247, 3468626145, 320747451, 634630001, 2055869391, 4241133684, 1610951808, 1581566113, 3235798540, 1818136928, 1199401944, 1115054516, 723010269, 3347885637, 800360497, 1422974869, 283734252, 2771107938, 117078600, 818033591, 1037865241, 3961239858, 4222155578, 4030809971, 234425515, 2956224458, 2730970914, 1288396510, 1888167758, 3057216048, 3578631048, 1728284049, 2724625204, 1054147083, 1871961398, 1735312844, 2166734816, 168880633, 24384753, 3038200818, 2882354760, 976254862, 2001355503, 894973773, 1068139320, 3334473054, 2037619190, 765840183, 1156702115, 12369570, 2597444020, 888796763, 3710663870, 2491087108, 2044898160, 1363350129, 3807652902, 3535991915, 272842865, 281612555, 2254434074, 3827589242, 3899767907, 2279082141, 3479273572, 2814097714, 97995378, 4214971289, 3816458472, 2599629133, 3465843910, 2790739301, 230039475, 1817744998, 2732978327, 2601378511, 3091774810, 2495104934, 1486884850, 3280737566, 405033206, 1586416400, 3794657352, 1222659138, 2333767944, 3965475957, 248530581, 4254662314, 224910645, 2393050717, 2712795476, 2124262024, 3029761107, 3291758314, 2612583549, 3355172713, 362942448, 248459462, 2222527968, 1925982664, 2631301166, 1262189521, 226088039, 2849780648, 2805486304, 2581737832, 406650757, 224771094, 473050877, 2861073282, 1177333586, 1195887748, 3205865658, 3981515884, 1495908592, 890423830, 4280763039, 1708017188, 1556199839, 3051436994, 4290430325, 3047722658, 1193214054, 3263548534, 1824668365, 2333213903, 1500567758, 2429568275, 3901062589, 780901430, 228326101, 3298827044, 3570587946, 3146041401, 3165285812, 812454990, 2521604500, 4128991942, 1431036658, 4060958761, 4289922731, 2833055361, 1645735614, 3842849689, 3088667489, 1539039441, 2314580780, 3321131120, 1295407449, 1247741922, 2936902276, 2085732133, 440330534, 3471918764, 1984688613, 1758126521, 623795489, 38246938, 2569814853, 4124302992, 3381578593, 199712781, 788592718, 3787393467, 2375176042, 4078032940, 3864064409, 1121457068, 2768231207, 4205044777, 3040453207, 3035500701, 2153130830, 2261395991, 3631768017, 558478941, 1841127954, 1838247596, 1889043009, 2458332291, 1704176636, 2952328722, 3931705808, 3286577937, 2635546584, 209620643, 3730497571, 3317612621, 4283846489, 2567330060, 2580927851, 3595910294, 2717947119, 1965619605, 2095885474, 3895322494, 4262737388, 4257252477, 3753504167, 3706030097, 4073731807, 4246079516, 3694690433, 2956718792, 2787524926, 2057493759, 2432648321, 1670377932, 3738118240, 268007571, 1226383639, 2265947531, 546423550, 3714148572, 3227502849, 1456292025, 700447088, 3595276225, 2182228156, 1750303776, 2465274622, 487920799, 2281419410, 3218848152, 589513676, 1590712122, 806901846, 554829846, 4027827004, 264546041, 2203415452, 392621842, 668295181, 4020885283, 3014278596, 868936046, 4081090136, 3009608296, 610125091, 786805489, 2146584157, 3190772421, 3656610301, 4131685015, 2890286078, 924636406, 3977178192, 1604737618, 3177987188, 813451937, 278711269, 2395778024, 3447780751, 3796670865, 209427474, 2310652538, 3077617736, 1273051275, 3870502954, 4285119290, 295029690, 2467175780, 1541438206, 2157281121, 509815739, 1837880662, 3032874876, 1076931011, 976604410, 2204497135, 3912872007, 3680009164, 1150588396, 2089893420, 3487583545, 2066756963, 3971180299, 941849673, 1600070401, 1889657301, 876741978, 1998278289, 2124460302, 2527915487, 898829212, 4155633644, 4086204903, 4146342832, 3906983984, 266427868, 2243151658, 4186047366, 4206321231, 1081396041, 2787898421, 3011799335, 2648547794, 259266601, 1276097206, 2542850531, 3884190364, 2464687074, 3700170994, 3880458120, 4020818439, 3347169468, 635866943, 1382397878, 2498577749, 4251933010, 357668623, 1803925873, 458025859, 565900135, 647894247, 1375602189, 567327240, 938364900, 2716736848, 598521711, 4195079908, 4042078288, 3387113901, 1262738960, 569111571, 349810831, 2739785506, 350877943, 1992470297, 3599241753, 116111636, 3317798840, 2178582817, 923226694, 1565876634, 1095485013, 2593260457, 493505836, 154934883, 2796377603, 2237502737, 992830252, 1772448605, 53276274, 1901168650, 2805664170, 3990402151, 4049019961, 1527647650, 2011756585, 621096797, 3301347535, 2828111613, 3373295489, 623804902, 1718140294, 3064230079, 3629862234, 3884101050, 1511722245, 1392271911, 3702893171, 215009199, 2856931888, 3608513081, 859197175, 2223677970, 710725954, 3657575918, 1766671708, 3289354759, 1071692366, 3328583210, 2051946520, 376480735, 3255563394, 2200558450, 2179804668, 3403417571, 3423355234, 3861875075, 1988563217, 116083684, 1651028406, 1277960000, 3878066999, 3105361287, 3808780011, 1536208523, 292508985, 3254760519, 1703587220, 3254851341, 178482149, 3005070562, 4106124487, 1606765545, 3594807506, 2918119713, 2598410793, 2672742562, 577762857, 470576260, 1214765257, 2747899218, 261490071, 3960261364, 1486974463, 769208405, 2325714958, 864722997, 3171328104, 3901582070, 3077330605, 276770855, 911115335, 3284655996, 1955036182, 2819197206, 374955616, 219624411, 3477959756, 1304959984, 3142312834, 577404337, 881131924, 3200126998, 756129640, 4287032445, 2810867275, 3313719500, 563463307, 4067384148, 2599734578, 4260565499, 1542386583, 770883433, 1128194920, 1152352000, 917258226, 899873017, 3623881158, 4228698419, 2272543693, 1280710394, 2402132667, 1013714787, 3937372529, 136380315, 1310203870, 2897477675, 3230633142, 3240657636, 1422780059, 1225179326, 2335259439, 3169004856, 2678176048, 2457356679, 209622535, 1833202792, 2203025881, 3442020838, 3250497399, 1655528001, 4218099962, 2316510540, 4204040173, 3863385053, 2318392825, 2881102997, 3129134603, 3957220178, 209324267, 1374650759, 3828596874, 1439728508, 674002413, 4038605536, 3083418135, 3758180973, 3618728695, 1982861048, 3034195683, 2954870739, 1325261867, 3713272507, 1483762542, 3559258366, 1782230780, 4972340, 1819778062, 77665677, 3033862932, 3610176323], 80, 0, 0.0], 'specified_by': {'@id': '_:random_state'}, '@type': 'mls:HyperParameterSetting'}

or just the int seed would obviously be fine.

the more aggressive approaches would be on our end to enforce something through warnings or instead of calling export afterwards, do something both before and after calling fit or predict or whatever on models, like saving the numpy singleton rng's state before and after and saving it to the mls. having the state afterward would detect the case where there were other things consuming the singleton rng in parallel, but would still not make it possible to deterministically reproduce the run of course.

i don't think there is any action to take now so will close but feel free to reopen.