LUH-DBS / Matelda

Apache License 2.0
0 stars 0 forks source link

Log shows errors during get_train_test_sets #17

Closed MarcSpeckmann closed 8 months ago

MarcSpeckmann commented 2 years ago

The log file shows errors during the get_train_test_sets method that do not cause the program to terminate. Log file is attached.

2022-09-07 15:14:25,176 INFO end-to-end-eds - run_experiments: Labels loaded.
2022-09-07 15:14:25,443 INFO end-to-end-eds - run_experiments: Table grouping output loaded.
2022-09-07 15:14:25,772 INFO end-to-end-eds - run_experiments: number of column clusters: 16
2022-09-07 15:14:36,624 INFO end-to-end-eds - run_experiments: Cell features loaded.
2022-09-07 15:14:36,961 INFO ed_twolevel_rahas_features - get_train_test_sets: Processing cluster 0, from <_io.BufferedReader name='outputs/raha-datasets/Issue-Test/column_groups/col_df_labels_cluster_-1.pickle'>
2022-09-07 15:14:48,718 ERROR ed_twolevel_rahas_features - get_train_test_sets: (0, 11, 1223, 'null', 'og')
2022-09-07 15:14:48,718 INFO ed_twolevel_rahas_features - get_train_test_sets: Processing cluster 1, from <_io.BufferedReader name='outputs/raha-datasets/Issue-Test/column_groups/col_df_labels_cluster_-1.pickle'>
2022-09-07 15:16:14,711 INFO end-to-end-eds - run_experiments: Labels loaded.
2022-09-07 15:16:15,009 INFO end-to-end-eds - run_experiments: Table grouping output loaded.
2022-09-07 15:16:15,333 INFO end-to-end-eds - run_experiments: number of column clusters: 16
2022-09-07 15:16:26,195 INFO end-to-end-eds - run_experiments: Cell features loaded.
2022-09-07 15:16:26,524 INFO ed_twolevel_rahas_features - get_train_test_sets: Processing cluster 0, from <_io.BufferedReader name='outputs/raha-datasets/Issue-Test/column_groups/col_df_labels_cluster_-1.pickle'>
2022-09-07 15:16:38,512 ERROR ed_twolevel_rahas_features - get_train_test_sets: (0, 11, 1223, 'null', 'og')
2022-09-07 15:16:38,512 INFO ed_twolevel_rahas_features - get_train_test_sets: Processing cluster 1, from <_io.BufferedReader name='outputs/raha-datasets/Issue-Test/column_groups/col_df_labels_cluster_-1.pickle'>
2022-09-07 15:18:07,875 INFO ed_twolevel_rahas_features - get_train_test_sets: Length of X_test: 1601223
2022-09-07 15:18:07,876 INFO ed_twolevel_rahas_features - get_train_test_sets: Length of X_tmp: 1400000
2022-09-07 15:18:07,876 INFO ed_twolevel_rahas_features - sampling_labeling: sampling_labeling
2022-09-07 15:18:19,202 ERROR ed_twolevel_rahas_features - get_train_test_sets: Unable to allocate 7.13 TiB for an array with shape (979999300000,) and data type float64
2022-09-07 15:18:19,205 INFO ed_twolevel_rahas_features - get_train_test_sets: Processing cluster 2, from <_io.BufferedReader name='outputs/raha-datasets/Issue-Test/column_groups/col_df_labels_cluster_-1.pickle'>
2022-09-07 15:18:31,397 ERROR ed_twolevel_rahas_features - get_train_test_sets: (0, 7, 0, '1907', 'og')
2022-09-07 15:18:31,397 INFO ed_twolevel_rahas_features - get_train_test_sets: Processing cluster 3, from <_io.BufferedReader name='outputs/raha-datasets/Issue-Test/column_groups/col_df_labels_cluster_-1.pickle'>
2022-09-07 15:18:32,042 INFO ed_twolevel_rahas_features - get_train_test_sets: Length of X_test: 1805975
2022-09-07 15:18:32,042 INFO ed_twolevel_rahas_features - get_train_test_sets: Length of X_tmp: 4752
2022-09-07 15:18:32,043 INFO ed_twolevel_rahas_features - sampling_labeling: sampling_labeling
2022-09-07 15:18:32,814 INFO ed_twolevel_rahas_features - sampling_labeling: labeling
2022-09-07 15:18:32,814 INFO ed_twolevel_rahas_features - label_propagation: Label propagation
2022-09-07 15:18:32,815 INFO ed_twolevel_rahas_features - label_propagation: Length of X_train: 4752
2022-09-07 15:18:32,815 INFO ed_twolevel_rahas_features - get_train_test_sets: Processing cluster 4, from <_io.BufferedReader name='outputs/raha-datasets/Issue-Test/column_groups/col_df_labels_cluster_-1.pickle'>
2022-09-07 15:18:32,907 ERROR ed_twolevel_rahas_features - get_train_test_sets: (1, 3, 808, 'Thu  20:00', 'og')
2022-09-07 15:18:32,907 INFO ed_twolevel_rahas_features - get_train_test_sets: Processing cluster 5, from <_io.BufferedReader name='outputs/raha-datasets/Issue-Test/column_groups/col_df_labels_cluster_-1.pickle'>
2022-09-07 15:18:34,114 INFO ed_twolevel_rahas_features - get_train_test_sets: Length of X_test: 1828953
2022-09-07 15:18:34,115 INFO ed_twolevel_rahas_features - get_train_test_sets: Length of X_tmp: 22170
2022-09-07 15:18:34,115 INFO ed_twolevel_rahas_features - sampling_labeling: sampling_labeling
2022-09-07 15:19:01,780 INFO ed_twolevel_rahas_features - sampling_labeling: labeling
2022-09-07 15:19:01,780 INFO ed_twolevel_rahas_features - label_propagation: Label propagation
2022-09-07 15:19:01,784 INFO ed_twolevel_rahas_features - label_propagation: Length of X_train: 26922
2022-09-07 15:19:01,784 INFO ed_twolevel_rahas_features - get_train_test_sets: Processing cluster 6, from <_io.BufferedReader name='outputs/raha-datasets/Issue-Test/column_groups/col_df_labels_cluster_-1.pickle'>
2022-09-07 15:19:09,756 ERROR ed_twolevel_rahas_features - get_train_test_sets: (2, 16, 3319, "In modern times, the true meaning of Christmas is getting lost. Children don't write letters anymore...                See full synopsis\xa0»", 'og')
2022-09-07 15:19:09,756 INFO ed_twolevel_rahas_features - get_train_test_sets: Processing cluster 7, from <_io.BufferedReader name='outputs/raha-datasets/Issue-Test/column_groups/col_df_labels_cluster_-1.pickle'>
2022-09-07 15:19:10,294 ERROR ed_twolevel_rahas_features - get_train_test_sets: (5, 5, 101, '1612-4782     1612-4790', 'og')
2022-09-07 15:19:10,294 INFO ed_twolevel_rahas_features - get_train_test_sets: Processing cluster 8, from <_io.BufferedReader name='outputs/raha-datasets/Issue-Test/column_groups/col_df_labels_cluster_-1.pickle'>
2022-09-07 15:19:11,198 ERROR ed_twolevel_rahas_features - get_train_test_sets: (3, 16, 2, 'surgery patients who were given the  right kind  of antibiotic to help prevent infection', 'og')
2022-09-07 15:19:11,199 INFO ed_twolevel_rahas_features - get_train_test_sets: Processing cluster 9, from <_io.BufferedReader name='outputs/raha-datasets/Issue-Test/column_groups/col_df_labels_cluster_-1.pickle'>
2022-09-07 15:19:11,672 ERROR ed_twolevel_rahas_features - get_train_test_sets: (4, 2, 954, 'Yeti  Imperial Stout', 'og')
2022-09-07 15:19:11,672 INFO ed_twolevel_rahas_features - get_train_test_sets: Processing cluster 10, from <_io.BufferedReader name='outputs/raha-datasets/Issue-Test/column_groups/col_df_labels_cluster_-1.pickle'>
2022-09-07 15:19:12,421 INFO ed_twolevel_rahas_features - get_train_test_sets: Length of X_test: 1948489
2022-09-07 15:19:12,421 INFO ed_twolevel_rahas_features - get_train_test_sets: Length of X_tmp: 12050
2022-09-07 15:19:12,421 INFO ed_twolevel_rahas_features - sampling_labeling: sampling_labeling
2022-09-07 15:19:17,853 INFO ed_twolevel_rahas_features - sampling_labeling: labeling
2022-09-07 15:19:17,853 INFO ed_twolevel_rahas_features - label_propagation: Label propagation
2022-09-07 15:19:17,856 INFO ed_twolevel_rahas_features - label_propagation: Length of X_train: 38972
2022-09-07 15:19:17,856 INFO ed_twolevel_rahas_features - get_train_test_sets: Processing cluster 11, from <_io.BufferedReader name='outputs/raha-datasets/Issue-Test/column_groups/col_df_labels_cluster_-1.pickle'>
2022-09-07 15:19:18,029 ERROR ed_twolevel_rahas_features - get_train_test_sets: (4, 6, 0, 'null', 'og')
2022-09-07 15:19:18,029 INFO ed_twolevel_rahas_features - get_train_test_sets: Processing cluster 12, from <_io.BufferedReader name='outputs/raha-datasets/Issue-Test/column_groups/col_df_labels_cluster_-1.pickle'>
2022-09-07 15:19:18,070 ERROR ed_twolevel_rahas_features - get_train_test_sets: (5, 6, 0, '64.0', 'og')
2022-09-07 15:19:18,070 INFO ed_twolevel_rahas_features - get_train_test_sets: Processing cluster 13, from <_io.BufferedReader name='outputs/raha-datasets/Issue-Test/column_groups/col_df_labels_cluster_-1.pickle'>
2022-09-07 15:19:18,153 ERROR ed_twolevel_rahas_features - get_train_test_sets: (5, 9, 3, '714-9 ST  - [Noninvasive prenatal diagnosis of trisomy 21, 18 and 13 using cell-free fetal DNA]-', 'og')
2022-09-07 15:19:18,153 INFO ed_twolevel_rahas_features - get_train_test_sets: Processing cluster 14, from <_io.BufferedReader name='outputs/raha-datasets/Issue-Test/column_groups/col_df_labels_cluster_-1.pickle'>
2022-09-07 15:19:18,198 INFO ed_twolevel_rahas_features - get_train_test_sets: Length of X_test: 1951914
2022-09-07 15:19:18,198 INFO ed_twolevel_rahas_features - get_train_test_sets: Length of X_tmp: 12
2022-09-07 15:19:18,198 INFO ed_twolevel_rahas_features - sampling_labeling: sampling_labeling
2022-09-07 15:19:18,198 INFO ed_twolevel_rahas_features - sampling_labeling: labeling
2022-09-07 15:19:18,199 INFO ed_twolevel_rahas_features - label_propagation: Label propagation
2022-09-07 15:19:18,199 INFO ed_twolevel_rahas_features - label_propagation: Length of X_train: 38984
2022-09-07 15:19:18,199 INFO ed_twolevel_rahas_features - get_train_test_sets: Processing cluster -1, from <_io.BufferedReader name='outputs/raha-datasets/Issue-Test/column_groups/col_df_labels_cluster_-1.pickle'>
2022-09-07 15:19:19,152 INFO ed_twolevel_rahas_features - get_train_test_sets: Length of X_test: 1969076
2022-09-07 15:19:19,152 INFO ed_twolevel_rahas_features - get_train_test_sets: Length of X_tmp: 17162
2022-09-07 15:19:19,152 INFO ed_twolevel_rahas_features - sampling_labeling: sampling_labeling
2022-09-07 15:19:32,751 INFO ed_twolevel_rahas_features - sampling_labeling: labeling
2022-09-07 15:19:32,752 INFO ed_twolevel_rahas_features - label_propagation: Label propagation
2022-09-07 15:19:32,755 INFO ed_twolevel_rahas_features - label_propagation: Length of X_train: 56146
2022-09-07 15:19:33,776 INFO ed_twolevel_rahas_features - get_train_test_sets: Number of Labeled Cells: 13
2022-09-07 15:19:33,821 INFO ed_twolevel_rahas_features - classify: Classification
2022-09-07 15:21:13,979 INFO end-to-end-eds - run_experiments: Labels loaded.
2022-09-07 15:21:14,285 INFO end-to-end-eds - run_experiments: Table grouping output loaded.
2022-09-07 15:21:14,610 INFO end-to-end-eds - run_experiments: number of column clusters: 16
2022-09-07 15:21:25,607 INFO end-to-end-eds - run_experiments: Cell features loaded.
2022-09-07 15:21:25,938 INFO ed_twolevel_rahas_features - get_train_test_sets: Processing cluster 0, from <_io.BufferedReader name='outputs/raha-datasets/Issue-Test/column_groups/col_df_labels_cluster_-1.pickle'>
2022-09-07 15:21:37,597 ERROR ed_twolevel_rahas_features - get_train_test_sets: (0, 11, 1223, 'null', 'og')
2022-09-07 15:21:37,597 INFO ed_twolevel_rahas_features - get_train_test_sets: Processing cluster 1, from <_io.BufferedReader name='outputs/raha-datasets/Issue-Test/column_groups/col_df_labels_cluster_-1.pickle'>
2022-09-07 15:23:04,861 INFO ed_twolevel_rahas_features - get_train_test_sets: Length of X_test: 1601223
2022-09-07 15:23:04,862 INFO ed_twolevel_rahas_features - get_train_test_sets: Length of X_tmp: 1400000
2022-09-07 15:23:04,862 INFO ed_twolevel_rahas_features - sampling_labeling: sampling_labeling
2022-09-07 15:23:16,217 ERROR ed_twolevel_rahas_features - get_train_test_sets: Unable to allocate 7.13 TiB for an array with shape (979999300000,) and data type float64
2022-09-07 15:23:16,220 INFO ed_twolevel_rahas_features - get_train_test_sets: Processing cluster 2, from <_io.BufferedReader name='outputs/raha-datasets/Issue-Test/column_groups/col_df_labels_cluster_-1.pickle'>
2022-09-07 15:23:28,100 ERROR ed_twolevel_rahas_features - get_train_test_sets: (0, 7, 0, '1907', 'og')
2022-09-07 15:23:28,101 INFO ed_twolevel_rahas_features - get_train_test_sets: Processing cluster 3, from <_io.BufferedReader name='outputs/raha-datasets/Issue-Test/column_groups/col_df_labels_cluster_-1.pickle'>
2022-09-07 15:23:28,745 INFO ed_twolevel_rahas_features - get_train_test_sets: Length of X_test: 1805975
2022-09-07 15:23:28,745 INFO ed_twolevel_rahas_features - get_train_test_sets: Length of X_tmp: 4752
2022-09-07 15:23:28,746 INFO ed_twolevel_rahas_features - sampling_labeling: sampling_labeling
2022-09-07 15:23:29,519 INFO ed_twolevel_rahas_features - sampling_labeling: labeling
2022-09-07 15:23:29,519 INFO ed_twolevel_rahas_features - label_propagation: Label propagation
2022-09-07 15:23:29,520 INFO ed_twolevel_rahas_features - label_propagation: Length of X_train: 4752
2022-09-07 15:23:29,520 INFO ed_twolevel_rahas_features - get_train_test_sets: Processing cluster 4, from <_io.BufferedReader name='outputs/raha-datasets/Issue-Test/column_groups/col_df_labels_cluster_-1.pickle'>
2022-09-07 15:23:29,612 ERROR ed_twolevel_rahas_features - get_train_test_sets: (1, 3, 808, 'Thu  20:00', 'og')
2022-09-07 15:23:29,613 INFO ed_twolevel_rahas_features - get_train_test_sets: Processing cluster 5, from <_io.BufferedReader name='outputs/raha-datasets/Issue-Test/column_groups/col_df_labels_cluster_-1.pickle'>
2022-09-07 15:23:30,793 INFO ed_twolevel_rahas_features - get_train_test_sets: Length of X_test: 1828953
2022-09-07 15:23:30,793 INFO ed_twolevel_rahas_features - get_train_test_sets: Length of X_tmp: 22170
2022-09-07 15:23:30,793 INFO ed_twolevel_rahas_features - sampling_labeling: sampling_labeling
2022-09-07 15:23:58,291 INFO ed_twolevel_rahas_features - sampling_labeling: labeling
2022-09-07 15:23:58,291 INFO ed_twolevel_rahas_features - label_propagation: Label propagation
2022-09-07 15:23:58,295 INFO ed_twolevel_rahas_features - label_propagation: Length of X_train: 26922
2022-09-07 15:23:58,296 INFO ed_twolevel_rahas_features - get_train_test_sets: Processing cluster 6, from <_io.BufferedReader name='outputs/raha-datasets/Issue-Test/column_groups/col_df_labels_cluster_-1.pickle'>
2022-09-07 15:24:06,178 ERROR ed_twolevel_rahas_features - get_train_test_sets: (2, 16, 3319, "In modern times, the true meaning of Christmas is getting lost. Children don't write letters anymore...                See full synopsis\xa0»", 'og')
2022-09-07 15:24:06,178 INFO ed_twolevel_rahas_features - get_train_test_sets: Processing cluster 7, from <_io.BufferedReader name='outputs/raha-datasets/Issue-Test/column_groups/col_df_labels_cluster_-1.pickle'>
2022-09-07 15:24:06,713 ERROR ed_twolevel_rahas_features - get_train_test_sets: (5, 5, 101, '1612-4782     1612-4790', 'og')
2022-09-07 15:24:06,713 INFO ed_twolevel_rahas_features - get_train_test_sets: Processing cluster 8, from <_io.BufferedReader name='outputs/raha-datasets/Issue-Test/column_groups/col_df_labels_cluster_-1.pickle'>
2022-09-07 15:24:07,604 ERROR ed_twolevel_rahas_features - get_train_test_sets: (3, 16, 2, 'surgery patients who were given the  right kind  of antibiotic to help prevent infection', 'og')
2022-09-07 15:24:07,604 INFO ed_twolevel_rahas_features - get_train_test_sets: Processing cluster 9, from <_io.BufferedReader name='outputs/raha-datasets/Issue-Test/column_groups/col_df_labels_cluster_-1.pickle'>
2022-09-07 15:24:08,069 ERROR ed_twolevel_rahas_features - get_train_test_sets: (4, 2, 954, 'Yeti  Imperial Stout', 'og')
2022-09-07 15:24:08,069 INFO ed_twolevel_rahas_features - get_train_test_sets: Processing cluster 10, from <_io.BufferedReader name='outputs/raha-datasets/Issue-Test/column_groups/col_df_labels_cluster_-1.pickle'>
2022-09-07 15:24:08,802 INFO ed_twolevel_rahas_features - get_train_test_sets: Length of X_test: 1948489
2022-09-07 15:24:08,802 INFO ed_twolevel_rahas_features - get_train_test_sets: Length of X_tmp: 12050
2022-09-07 15:24:08,802 INFO ed_twolevel_rahas_features - sampling_labeling: sampling_labeling
2022-09-07 15:24:14,081 INFO ed_twolevel_rahas_features - sampling_labeling: labeling
2022-09-07 15:24:14,081 INFO ed_twolevel_rahas_features - label_propagation: Label propagation
2022-09-07 15:24:14,084 INFO ed_twolevel_rahas_features - label_propagation: Length of X_train: 38972
2022-09-07 15:24:14,084 INFO ed_twolevel_rahas_features - get_train_test_sets: Processing cluster 11, from <_io.BufferedReader name='outputs/raha-datasets/Issue-Test/column_groups/col_df_labels_cluster_-1.pickle'>
2022-09-07 15:24:14,256 ERROR ed_twolevel_rahas_features - get_train_test_sets: (4, 6, 0, 'null', 'og')
2022-09-07 15:24:14,256 INFO ed_twolevel_rahas_features - get_train_test_sets: Processing cluster 12, from <_io.BufferedReader name='outputs/raha-datasets/Issue-Test/column_groups/col_df_labels_cluster_-1.pickle'>
2022-09-07 15:24:14,299 ERROR ed_twolevel_rahas_features - get_train_test_sets: (5, 6, 0, '64.0', 'og')
2022-09-07 15:24:14,299 INFO ed_twolevel_rahas_features - get_train_test_sets: Processing cluster 13, from <_io.BufferedReader name='outputs/raha-datasets/Issue-Test/column_groups/col_df_labels_cluster_-1.pickle'>
2022-09-07 15:24:14,382 ERROR ed_twolevel_rahas_features - get_train_test_sets: (5, 9, 3, '714-9 ST  - [Noninvasive prenatal diagnosis of trisomy 21, 18 and 13 using cell-free fetal DNA]-', 'og')
2022-09-07 15:24:14,382 INFO ed_twolevel_rahas_features - get_train_test_sets: Processing cluster 14, from <_io.BufferedReader name='outputs/raha-datasets/Issue-Test/column_groups/col_df_labels_cluster_-1.pickle'>
2022-09-07 15:24:14,427 INFO ed_twolevel_rahas_features - get_train_test_sets: Length of X_test: 1951914
2022-09-07 15:24:14,427 INFO ed_twolevel_rahas_features - get_train_test_sets: Length of X_tmp: 12
2022-09-07 15:24:14,427 INFO ed_twolevel_rahas_features - sampling_labeling: sampling_labeling
2022-09-07 15:24:14,428 INFO ed_twolevel_rahas_features - sampling_labeling: labeling
2022-09-07 15:24:14,428 INFO ed_twolevel_rahas_features - label_propagation: Label propagation
2022-09-07 15:24:14,428 INFO ed_twolevel_rahas_features - label_propagation: Length of X_train: 38984
2022-09-07 15:24:14,428 INFO ed_twolevel_rahas_features - get_train_test_sets: Processing cluster -1, from <_io.BufferedReader name='outputs/raha-datasets/Issue-Test/column_groups/col_df_labels_cluster_-1.pickle'>
2022-09-07 15:24:15,360 INFO ed_twolevel_rahas_features - get_train_test_sets: Length of X_test: 1969076
2022-09-07 15:24:15,360 INFO ed_twolevel_rahas_features - get_train_test_sets: Length of X_tmp: 17162
2022-09-07 15:24:15,360 INFO ed_twolevel_rahas_features - sampling_labeling: sampling_labeling
2022-09-07 15:24:28,854 INFO ed_twolevel_rahas_features - sampling_labeling: labeling
2022-09-07 15:24:28,855 INFO ed_twolevel_rahas_features - label_propagation: Label propagation
2022-09-07 15:24:28,858 INFO ed_twolevel_rahas_features - label_propagation: Length of X_train: 56146
2022-09-07 15:24:29,891 INFO ed_twolevel_rahas_features - get_train_test_sets: Number of Labeled Cells: 13
2022-09-07 15:24:29,936 INFO ed_twolevel_rahas_features - classify: Classification
MarcSpeckmann commented 2 years ago

I did another run with the current version and the logs showed only some of the following errors. The other errors are no longer displayed/occured.

2022-09-12 15:05:38,650 ERROR ed_twolevel_rahas_features - get_train_test_sets: Unable to allocate 7.13 TiB for an array with shape (979999300000,) and data type float64