Open IsaacGreenMachine opened 1 month ago
here's the output, btw (please ignore the numbers for the cluster names) scores are infinity
here's the topic info after fully training with partial_fit on my dataset:
topic_model.get_topic(1, full=True)
{'Main': [('1054375', inf),
('982395', inf),
('1013164', inf),
('1031508', inf),
('1031576', inf),
('957591', inf),
('1043766', inf),
('1054256', inf),
('1054349', inf),
('1054355', inf)]}
topic_model.get_topic_info()
Topic Count Name Representation Representative_Docs
0 0 109104 0_30470_32665_14_29 [30470, 32665, 14, 29, 31, 10, 40, 44, 53, 45] NaN
1 1 52972 1_1054375_982395_1013164_1031508 [1054375, 982395, 1013164, 1031508, 1031576, 9... NaN
2 2 129841 2_369045_369865_371828_371766 [369045, 369865, 371828, 371766, 365563, 36557... NaN
3 3 66873 3_27_13_16_38 [27, 13, 16, 38, 44, 3350, 46, 3574, 3831, 4222] NaN
4 4 41935 4_12_22_31_14 [12, 22, 31, 14, 10, 43, 39, 42, 37, 38] NaN
5 5 67877 5_220939_220545_222224_220700 [220939, 220545, 222224, 220700, 220669, 21894... NaN
6 6 48487 6_1767968_1593953_1683623_1593883 [1767968, 1593953, 1683623, 1593883, 1683534, ... NaN
7 7 35557 7_552517_545624_543607_552708 [552517, 545624, 543607, 552708, 543675, 55253... NaN
8 8 75309 8_14_15_13_16 [14, 15, 13, 16, 42, 38, 29, 53, 10, 3675] NaN
9 9 77294 9_218852_220508_218868_218942 [218852, 220508, 218868, 218942, 220498, 15, 3... NaN
10 10 60717 10_599079_574468_571526_588418 [599079, 574468, 571526, 588418, 598261, 59877... NaN
11 11 46438 11_4756774_4756616_4747992_4756619 [4756774, 4756616, 4747992, 4756619, 4756573, ... NaN
12 12 91285 12_31_42_3350_40 [31, 42, 3350, 40, 14, 3389, 3381, 48, 3727, 44] NaN
13 13 67976 13_387158_391193_392310_384981 [387158, 391193, 392310, 384981, 392368, 39232... NaN
14 14 60478 14_19_13_38_41 [19, 13, 38, 41, 10, 44, 48, 52, 3727, 4204] NaN
15 15 60942 15_240524_241155_240871_241228 [240524, 241155, 240871, 241228, 241243, 24117... NaN
16 16 87910 16_218382_10_14_28 [218382, 10, 14, 28, 27, 29, 31, 37, 43, 39] NaN
17 17 22748 17_849849_815686_839704_826626 [849849, 815686, 839704, 826626, 795510, 84975... NaN
18 18 58772 18_517017_498821_515931_476445 [517017, 498821, 515931, 476445, 516308, 51588... NaN
19 19 56241 19_220610_220836_39_13 [220610, 220836, 39, 13, 44, 48, 38, 14, 10, 51] NaN
20 20 110401 20_28305_16_10_12 [28305, 16, 10, 12, 27, 37, 29, 22, 41, 44] NaN
21 21 121051 21_15_17_12_27 [15, 17, 12, 27, 10, 18, 14, 37, 43, 42] NaN
22 22 53491 22_32963_27_37_3381 [32963, 27, 37, 3381, 38, 3532, 3574, 3404, 39... NaN
23 23 58245 23_30470_10_37_38 [30470, 10, 37, 38, 44, 29, 48, 3727, 4254, 4331] NaN
24 24 39084 24_444630_437352_436133_440889 [444630, 437352, 436133, 440889, 436174, 43615... NaN
25 25 13123 25_2883254_2842667_2866698_2866738 [2883254, 2842667, 2866698, 2866738, 2866690, ... NaN
26 26 60774 26_239877_240079_240123_240453 [239877, 240079, 240123, 240453, 240659, 24070... NaN
27 27 27280 27_1160426_1115483_1147040_1110211 [1160426, 1115483, 1147040, 1110211, 1110213, ... NaN
28 28 70162 28_368293_364473_365277_364744 [368293, 364473, 365277, 364744, 365791, 36457... NaN
29 29 78156 29_224484_222741_222735_224365 [224484, 222741, 222735, 224365, 222872, 22404... NaN
30 30 39348 30_29260_31_18_12 [29260, 31, 18, 12, 43, 10, 42, 44, 3389, 53] NaN
31 31 59289 31_33739_212113_212106_12 [33739, 212113, 212106, 12, 28, 43, 31, 14, 33... NaN
32 32 79654 32_27230_27_14_10 [27230, 27, 14, 10, 22, 29, 38, 37, 41, 3350] NaN
33 33 76397 33_222872_223204_14_10 [222872, 223204, 14, 10, 16, 22, 37, 47, 44, 29] NaN
34 34 70535 34_30267_17_10_43 [30267, 17, 10, 43, 27, 47, 31, 37, 3397, 29] NaN
35 35 40708 35_33454_13_16_43 [33454, 13, 16, 43, 29, 22, 41, 46, 52, 3574] NaN
36 36 33484 36_30470_28_16_13 [30470, 28, 16, 13, 3389, 44, 10, 48, 37, 4222] NaN
37 37 72767 37_222875_222950_223145_223221 [222875, 222950, 223145, 223221, 224365, 22330... NaN
38 38 36732 38_253412_251896_253272_253500 [253412, 251896, 253272, 253500, 253488, 25324... NaN
39 39 42591 39_250962_250920_248635_251452 [250962, 250920, 248635, 251452, 248993, 24901... NaN
40 40 17007 40_4714314_4699011_4714064_4714090 [4714314, 4699011, 4714064, 4714090, 4714056, ... NaN
41 41 68058 41_27321_27363_27393_27407 [27321, 27363, 27393, 27407, 14, 17, 18, 29, 2... NaN
42 42 94062 42_27321_27411_27552_27259 [27321, 27411, 27552, 27259, 16, 14, 31, 27, 2... NaN
43 43 79166 43_15_10_29_28 [15, 10, 29, 28, 3350, 37, 27, 3397, 53, 3381] NaN
44 44 79278 44_879543_839601_849938_850763 [879543, 839601, 849938, 850763, 849858, 87072... NaN
45 45 54102 45_224506_223904_224478_224046 [224506, 223904, 224478, 224046, 222820, 23759... NaN
46 46 65796 46_14_27_17_29 [14, 27, 17, 29, 37, 38, 48, 3555, 52, 3582] NaN
47 47 75014 47_38_52_29_3515 [38, 52, 29, 3515, 3695, 3636, 10, 3831, 3989,... NaN
48 48 50618 48_251880_249120_251382_251243 [251880, 249120, 251382, 251243, 250920, 25040... NaN
49 49 42643 49_239463_239393_238194_239129 [239463, 239393, 238194, 239129, 239184, 23920... NaN
Have you searched existing issues? 🔎
Desribe the bug
running partial_fit starts to throw error after ~100 iterations
Reproduction
I'm on an M3 MacBook Pro Python 3.12.4 scikit-learn 1.5.1 bertopic 0.16.3 numpy 1.26.4 scipy 1.14.0
here is a slightly modified version of the partial_fit example from the docs:
BERTopic Version
0.16.3