Open jim-gyas opened 2 weeks ago
@devatwiai and @ta4tsering, could you please review my card? It involves extracting line bounding boxes details for each line on synthetic pages, saving this data, and using it to extract lines from each synthetic page while generating corresponding transcripts for each line.
Hi @devatwiai, After evaluating various methods, I’ve found that Morphological Operations with Contour Detection is the most effective for extracting bounding boxes from clean, synthetic images. This approach is both fast and efficient.
Highlights: 1) Efficiency: This method is ideal for noise-free synthetic images, providing quick and accurate results. 2) Tight Bounding Boxes: Using approxPolyDP with an adjustable epsilon parameter allows for precise control over bounding box tightness. 3) Flexibility: It adapts well to different text sizes and configurations, ensuring robust detection.
However, I’ve also come across models like EAST (Efficient and Accurate Scene Text detector). While I’m not certain if EAST will provide tighter bounding boxes compared to the morphological operations method, it’s worth considering if the text detection conditions become more complex. Given that we are currently working with noise-free synthetic images, Morphological Operations seems to be the best fit. Nonetheless, exploring the EAST model might be beneficial if you anticipate dealing with more varied or challenging text detection scenarios.
{"id": "page_76_1123x265_count_1_font30_\u7389\u7fc5+zhuca+weizang.png", "image": "https://s3.amazonaws.com/monlam.ai.ocr/line_segmentations/Images/page_76_1123x265_count_1_font30_\u7389\u7fc5+zhuca+weizang.png", "height": 57, "width": 1022, "center": [549.0, 198.5], "points": [[38, 170], [38, 227], [1060, 227], [1060, 170]]}
{"id": "page_76_1123x265_count_1_font30_\u7389\u7fc5+zhuca+weizang.png", "image": "https://s3.amazonaws.com/monlam.ai.ocr/line_segmentations/Images/page_76_1123x265_count_1_font30_\u7389\u7fc5+zhuca+weizang.png", "height": 57, "width": 1052, "center": [556.0, 139.5], "points": [[30, 111], [30, 168], [1082, 168], [1082, 111]]}
{"id": "page_76_1123x265_count_1_font30_\u7389\u7fc5+zhuca+weizang.png", "image": "https://s3.amazonaws.com/monlam.ai.ocr/line_segmentations/Images/page_76_1123x265_count_1_font30_\u7389\u7fc5+zhuca+weizang.png", "height": 57, "width": 1039, "center": [558.5, 81.5], "points": [[39, 53], [39, 110], [1078, 110], [1078, 53]]}
@devatwiai the tight bounding box working fine with the noise page image
{"id": "page_25_1000x128_count_1_font30_\u7389\u7fc5+zhuca+weizang.png", "image": "https://s3.amazonaws.com/monlam.ai.ocr/line_segmentations/Images/page_25_1000x128_count_1_font30_\u7389\u7fc5+zhuca+weizang.png", "height": 45.0, "width": 125.0, "center": [451.5, 85.5], "points": [[394, 66], [396, 100], [389, 103], [395, 103], [396, 108], [414, 106], [426, 90], [420, 89], [422, 83], [431, 82], [441, 105], [449, 105], [446, 74], [454, 103], [462, 103], [462, 93], [475, 100], [478, 106], [486, 106], [504, 96], [505, 108], [513, 108], [514, 64], [450, 64], [445, 69], [440, 64], [412, 63], [408, 68]]}
{"id": "page_25_1000x128_count_1_font30_\u7389\u7fc5+zhuca+weizang.png", "image": "https://s3.amazonaws.com/monlam.ai.ocr/line_segmentations/Images/page_25_1000x128_count_1_font30_\u7389\u7fc5+zhuca+weizang.png", "height": 49.0, "width": 36.0, "center": [334.0, 83.5], "points": [[351, 64], [344, 64], [343, 59], [335, 59], [331, 63], [317, 64], [316, 108], [324, 108], [327, 84], [332, 92], [335, 108], [350, 108], [352, 96]]}
{"id": "page_25_1000x128_count_1_font30_\u7389\u7fc5+zhuca+weizang.png", "image": "https://s3.amazonaws.com/monlam.ai.ocr/line_segmentations/Images/page_25_1000x128_count_1_font30_\u7389\u7fc5+zhuca+weizang.png", "height": 26.455835342407227, "width": 49.47584915161133, "center": [64.3785400390625, 83.61317443847656], "points": [[60, 59], [56, 63], [51, 79], [57, 92], [60, 108], [76, 108], [78, 96], [77, 64], [69, 64]]}
{"id": "page_1_1123x265_count_1_font30_\u7389\u7fc5+zhuca+weizang.png_bbox_0", "image": "https://s3.amazonaws.com/monlam.ai.ocr/line_segmentations/Images/page_1_1123x265_count_1_font30_\u7389\u7fc5+zhuca+weizang.png_bbox_0", "height": 57, "width": 993, "center": [548.5, 198.5], "points": [[52, 170], [52, 227], [1045, 227], [1045, 170]], "page_number": 1}
{"id": "page_1_1123x265_count_1_font30_\u7389\u7fc5+zhuca+weizang.png_bbox_1", "image": "https://s3.amazonaws.com/monlam.ai.ocr/line_segmentations/Images/page_1_1123x265_count_1_font30_\u7389\u7fc5+zhuca+weizang.png_bbox_1", "height": 57, "width": 1023, "center": [555.5, 139.5], "points": [[44, 111], [44, 168], [1067, 168], [1067, 111]], "page_number": 1}
{"id": "page_1_1123x265_count_1_font30_\u7389\u7fc5+zhuca+weizang.png_bbox_2", "image": "https://s3.amazonaws.com/monlam.ai.ocr/line_segmentations/Images/page_1_1123x265_count_1_font30_\u7389\u7fc5+zhuca+weizang.png_bbox_2", "height": 51, "width": 23, "center": [64.5, 84.5], "points": [[53, 59], [53, 110], [76, 110], [76, 59]], "page_number": 1}
{"id": "page_1_1123x265_count_1_font30_\u7389\u7fc5+zhuca+weizang.png_bbox_3", "image": "https://s3.amazonaws.com/monlam.ai.ocr/line_segmentations/Images/page_1_1123x265_count_1_font30_\u7389\u7fc5+zhuca+weizang.png_bbox_3", "height": 53, "width": 238, "center": [231.0, 83.5], "points": [[112, 57], [112, 110], [350, 110], [350, 57]], "page_number": 1}
{"id": "page_1_1123x265_count_1_font30_\u7389\u7fc5+zhuca+weizang.png_bbox_4", "image": "https://s3.amazonaws.com/monlam.ai.ocr/line_segmentations/Images/page_1_1123x265_count_1_font30_\u7389\u7fc5+zhuca+weizang.png_bbox_4", "height": 56, "width": 669, "center": [728.5, 81.0], "points": [[394, 53], [394, 109], [1063, 109], [1063, 53]], "page_number": 1}```
{"id": "with_bounding_box_aug_page_24_1123x265_count_9_font30_\u7389\u7fc5+zhuca+weizang_aug_Def.png_polygon_bbox_0", "image": "https://s3.amazonaws.com/monlam.ai.ocr/line_segmentations/Images/with_bounding_box_aug_page_24_1123x265_count_9_font30_\u7389\u7fc5+zhuca+weizang_aug_Def.png_polygon_bbox_0", "points": [[1035, 167], [1034, 168], [1032, 168], [1031, 169], [1030, 169], [1028, 171], [1021, 171], [1020, 172], [1007, 172], [1006, 173], [993, 173], [992, 174], [983, 174], [982, 175], [976, 175], [973, 178], [979, 178], [982, 181], [981, 182], [976, 182], [975, 183], [969, 183], [968, 184], [962, 184], [961, 185], [955, 185], [954, 186], [948, 186], [947, 187], [941, 187], [940, 188], [939, 187], [936, 187], [935, 186], [932, 186], [931, 185], [929, 185], [927, 183], [927, 182], [926, 181], [926, 179], [925, 178], [926, 179], [926, 181], [927, 182], [927, 183], [928, 184], [928, 185], [931, 185], [932, 186], [934, 186], [935, 187], [938, 187], [939, 188], [941, 188], [942, 187], [948, 187], [949, 186], [956, 186], [957, 185], [964, 185], [965, 184], [970, 184], [971, 183], [977, 183], [978, 182], [981, 182], [982, 181], [979, 178], [978, 178], [977, 177], [977, 176], [978, 175], [984, 175], [985, 174], [995, 174], [996, 173], [997, 174], [998, 173], [1009, 173], [1010, 172], [1023, 172], [1024, 171], [1028, 171], [1030, 169], [1031, 169], [1032, 168], [1034, 168], [1035, 167], [1036, 167], [1037, 168], [1040, 168], [1041, 169], [1044, 169], [1045, 170], [1048, 170], [1049, 171], [1052, 171], [1053, 172], [1056, 172], [1057, 173], [1060, 173], [1061, 174], [1064, 174], [1065, 175], [1067, 175], [1069, 177], [1069, 179], [1072, 182], [1072, 184], [1073, 185], [1073, 188], [1074, 189], [1076, 189], [1077, 190], [1080, 190], [1081, 191], [1081, 197], [1082, 198], [1082, 199], [1083, 200], [1083, 206], [1082, 207], [1078, 207], [1077, 206], [1076, 207], [1076, 216], [1075, 217], [1075, 222], [1074, 223], [1074, 226], [1073, 227], [1070, 227], [1069, 226], [1066, 226], [1065, 225], [1062, 225], [1061, 224], [1059, 224], [1058, 223], [1054, 223], [1053, 222], [1050, 222], [1049, 221], [1046, 221], [1045, 220], [1042, 220], [1041, 219], [1038, 219], [1037, 218], [1021, 218], [1020, 219], [1007, 219], [1006, 220], [993, 220], [992, 221], [983, 221], [982, 222], [976, 222], [975, 223], [969, 223], [968, 224], [967, 224], [965, 226], [962, 226], [961, 227], [955, 227], [954, 228], [948, 228], [947, 229], [937, 229], [935, 227], [932, 227], [931, 226], [929, 226], [928, 225], [925, 225], [924, 224], [922, 224], [921, 223], [918, 223], [917, 222], [915, 222], [914, 221], [911, 221], [910, 220], [908, 220], [907, 219], [904, 219], [903, 218], [901, 218], [900, 217], [897, 217], [896, 216], [889, 216], [888, 217], [885, 217], [885, 218], [884, 219], [882, 219], [881, 220], [878, 220], [877, 221], [874, 221], [873, 222], [871, 222], [870, 223], [867, 223], [866, 224], [863, 224], [862, 225], [860, 225], [859, 226], [855, 226], [853, 224], [852, 224], [851, 225], [849, 225], [848, 226], [840, 226], [839, 225], [832, 225], [831, 224], [824, 224], [823, 223], [815, 223], [814, 222], [807, 222], [806, 221], [799, 221], [798, 220], [790, 220], [789, 219], [778, 219], [777, 218], [768, 218], [767, 219], [766, 218], [765, 219], [756, 219], [755, 218], [750, 218], [749, 219], [746, 219], [745, 220], [743, 220], [742, 221], [739, 221], [738, 222], [736, 222], [735, 221], [734, 222], [733, 222], [732, 221], [731, 221], [730, 222], [729, 222], [727, 220], [727, 219], [728, 218], [729, 218], [734, 213], [735, 213], [735, 212], [739, 208], [739, 207], [740, 206], [740, 205], [741, 204], [741, 202], [742, 201], [742, 199], [741, 198], [739, 198], [738, 199], [735, 199], [734, 200], [731, 200], [730, 201], [727, 201], [726, 202], [724, 202], [723, 203], [720, 203], [719, 204], [716, 204], [715, 205], [712, 205], [711, 206], [708, 206], [707, 207], [702, 207], [701, 206], [699, 206], [698, 205], [695, 205], [694, 204], [692, 204], [691, 203], [688, 203], [687, 202], [684, 202], [684, 204], [685, 205], [685, 210], [686, 211], [686, 218], [687, 219], [687, 222], [687, 219], [686, 218], [686, 211], [685, 210], [685, 205], [684, 204], [684, 203], [685, 202], [687, 202], [688, 203], [690, 203], [691, 204], [693, 204], [694, 205], [697, 205], [698, 206], [700, 206], [701, 207], [708, 207], [709, 206], [712, 206], [713, 205], [716, 205], [717, 204], [720, 204], [721, 203], [723, 203], [724, 202], [727, 202], [728, 201], [731, 201], [732, 200], [735, 200], [736, 199], [739, 199], [740, 198], [741, 198], [742, 199], [742, 201], [741, 202], [741, 204], [740, 205], [740, 206], [739, 207], [739, 208], [732, 215], [731, 215], [731, 216], [729, 218], [728, 218], [726, 220], [725, 220], [723, 222], [722, 222], [721, 223], [720, 223], [719, 224], [720, 224], [721, 223], [723, 223], [724, 222], [727, 222], [728, 221], [729, 222], [731, 222], [732, 221], [733, 222], [739, 222], [740, 221], [743, 221], [744, 220], [746, 220], [747, 219], [750, 219], [751, 218], [753, 218], [754, 219], [764, 219], [765, 220], [766, 219], [767, 219], [768, 218], [774, 218], [775, 219], [785, 219], [786, 220], [795, 220], [796, 221], [804, 221], [805, 222], [812, 222], [813, 223], [821, 223], [822, 224], [829, 224], [830, 225], [837, 225], [838, 226], [848, 226], [849, 225], [852, 225], [853, 224], [855, 226], [855, 227], [856, 227], [857, 226], [859, 226], [860, 225], [863, 225], [864, 224], [867, 224], [868, 223], [870, 223], [871, 222], [874, 222], [875, 221], [878, 221], [879, 220], [881, 220], [882, 219], [884, 219], [886, 217], [889, 217], [890, 216], [895, 216], [896, 217], [899, 217], [900, 218], [902, 218], [903, 219], [906, 219], [907, 220], [909, 220], [910, 221], [913, 221], [914, 222], [916, 222], [917, 223], [920, 223], [921, 224], [924, 224], [925, 225], [927, 225], [928, 226], [931, 226], [932, 227], [934, 227], [935, 228], [936, 228], [937, 229], [938, 229], [939, 230], [941, 230], [942, 229], [948, 229], [949, 228], [955, 228], [956, 227], [963, 227], [964, 226], [966, 226], [966, 225], [967, 224], [970, 224], [971, 223], [977, 223], [978, 222], [984, 222], [985, 221], [995, 221], [996, 220], [1009, 220], [1010, 219], [1023, 219], [1024, 218], [1036, 218], [1037, 219], [1040, 219], [1041, 220], [1044, 220], [1045, 221], [1048, 221], [1049, 222], [1052, 222], [1053, 223], [1056, 223], [1057, 224], [1060, 224], [1061, 225], [1064, 225], [1065, 226], [1068, 226], [1069, 227], [1072, 227], [1073, 228], [1074, 228], [1074, 223], [1075, 222], [1075, 217], [1076, 216], [1076, 208], [1077, 207], [1083, 207], [1084, 206], [1084, 205], [1083, 204], [1083, 199], [1081, 197], [1081, 191], [1082, 190], [1078, 190], [1077, 189], [1074, 189], [1073, 188], [1073, 185], [1072, 184], [1072, 182], [1069, 179], [1069, 177], [1067, 175], [1066, 175], [1065, 174], [1062, 174], [1061, 173], [1058, 173], [1057, 172], [1054, 172], [1053, 171], [1050, 171], [1049, 170], [1046, 170], [1045, 169], [1042, 169], [1041, 168], [1038, 168], [1037, 167]], "page_number": 1}```
ཊཚེ།༡༦།།༠།༢༥།༥ལིཀ༠༠༥༧༠༥༠༠༦/༠༠།༧༠༥༠༡༩༥གི༡༧༠་ི་ཞ༠༠༠༡ ྣརིྲགྣགིྣཾ།
་༡ི༣།༩༡༠༠༠ཧི༢ཞི༣།།།༥༠།༥༠ཞ༥།༡༢༩།༥:༩།༥༩ཀ"ེེ༩༠༩༥༠༥།༩།།ག༠༥༦༧༠༥།༥ཀེ༠།༥༠༥།༩།༥།།ག༥༦ན༥།ཉེན༥༢༧༧༢༤ུ༠༥༥༥༧ེད།༨༠༥༡༣༠༥༩༥༥ཞ་༥༦ེ༢༠༠༧༥། ཤའིཡི9།༠༡༣༡༥༥ ```
༄༅༅། །རྒྱ་གར་སྐད་དུ། བི་ན་ཡ་བསྟུ། བོད་སྐད་དུ། འདུལ་བ་གཞི། བམ་པོ་དང་པོ། དཀོན་མཆོག་གསུམ་ལ་ཕྱག་འཚལ་ལོ། །གང་གིས་འཆིང་རྣམས་ཡང་དག་རབ་བཅད་ཅིང་། །མུ་སྟེགས་ཚོགས་རྣམས་ཐམས་ཅད་རབ་བཅོམ་སྟེ། །སྡེ་དང་བཅས་པའི་བདུད་རྣམས་ངེས་བཅོམ་ནས། །བྱང་ཆུབ་འདི་བརྙེས་དེ་ལ་ཕྱག་འཚལ་ལོ། །ཁྱིམ་
དོན་ཆེ་ཆུང་སྤངས་ཏེ་དང་པོར་རབ་འབྱུང་དཀའ། །རབ་བྱུང་ཐོབ་ནས་ཡུལ་སྤྱད་དག་གིས་དགའ་ཐོབ་དཀའ། །མངོན་དགའ་ཇི་བཞིན་དོན་བསྐྱེད་ཡང་དག་བྱེད་པ་དཀའ། །ངུར་སྨྲིག་གོས་འཆང་མཁས་པ་ཚུལ་ལས་ཉམས་པ་དཀའ། །གཞི་རྣམས་ཀྱི་སྤྱི་སྡོམ་ལ། རབ་འབྱུང་གསོ་སྦྱོང་གཞི་དང་ནི། །དགག་དབྱེ་དབྱར་དང་ཀོ་ལྤགས་གཞི། །སྨན་དང་གོས་དང་སྲ་བརྐྱང་དང་
། །ཀཽ་ཤཱམ་བཱི་དང་ལས་ཀྱི་གཞི། །དམར་སེར་ཅན་དང་གང་ཟག་དང་། །སྤོ་དང་གསོ་སྦྱོང་བཞག་པ་དང་། །གནས་མལ་དང་ནི་རྩོད་པ་དང་། །དགེ་འདུན་དབྱེན་རྣམས་བསྡུས་པ་ཡིན། །རབ་ཏུ་འབྱུང་བའི་གཞིའི་སྤྱི་སྡོམ་ལ། ཤཱ་རིའི་བུ་དང་མུ་སྟེགས་ཅན། །དགེ་ཚུལ་གཉིས་དང་བྱ་རོག་སྐྲོད། །དགྲ་བཅོམ་བསད་དང་ལག་རྡུམ་གྱི། །སྡེ་ཚན་ཡང་དག་བསྡུས་པ་ཡིན། །
སྡོམ་ལ། ཤཱ་རིའི་བུ་དང་རབ་འབྱུང་དང་། །བསྙེན་པར་རྫོགས་པར་གནང་བ་དང་། །ཉེ་སྡེས་ཚོགས་ནི་བསྡུས་པ་དང་། །ལྔ་པའི་སྡེ་ཚན་བསྡུས་པ་ཡིན། །བྱང་ཆུབ་སེམས་དཔའ་དགའ་ལྡན་གྱི་གནས་ན་བཞུགས་པ་ན།ཡུལ་ཨང་ག་དག་ན་ཨང་གའི་རྒྱལ་པོ་ཞེས་བྱ་བས་རྒྱལ་སྲིད་འབྱོར་པ། རྒྱས་པ་བདེ་བ་ལོ་ལེགས་པ་སྐྱེ་བོ་དང་མི་མང་པོས་གང་བ་
བྱེད་དུ་བཅུག་གོ། །ཡུལ་མ་ག་དྷཱ་དག་ན་ཡང་རྒྱལ་པོ་པད་མ་ཆེན་པོ་ཞེས་བྱ་བས། རྒྱལ་སྲིད་འབྱོར་པ་རྒྱས་པ་བདེ་བ་ལོ་ལེགས་པ་སྐྱེ་བོ་དང་མི་མང་པོས་གང་བ་བྱེད་དུ་བཅུག་གོ། །རེས་འགའ་ནི་ཨང་གའི་རྒྱལ་པོ་དཔུང་དང་མཐུ་ཆེ་བ་ཡིན་ལ། རེས་འགའ་ནི་རྒྱལ་པོ་པད་མ་ཆེན་པོ་དཔུང་དང་མཐུ་ཆེ་བ་ཡིན་ནོ། །གང་གི་ཚེ་ཨང་གའི་
Description:
This project focuses on enhancing the existing package that generates synthetic page images for Tibetan Pecha and Modern Tibetan Book formats. The main tasks involve modifying the scripts (
pecha_format_page.py
andmodern_book_format_page.py
) to extract polygonal bounding box information tightly around each line of text on the synthetic pages. These tight bounding boxes are crucial for creating training data for line segmentation models. Additionally, the project will apply an offset to these bounding boxes to extract individual lines more realistically for OCR model training. The extracted lines, along with their transcripts, will be organized and saved in a structured format to develop and improve OCR models effectively.Completion Criteria:
pecha_format_page.py
andmodern_book_format_page.py
) accurately extract polygonal bounding box information for each line on synthetic page images.Method Applicable For Augmentation List :
1) BadPhotoCopy 2) Brightness 3) Contrast 4) Blur 5) Background 6) Grid Distort 7) InkBleed 8) RandomShadow
Current Method Not Applicable for Augmentation List:
1) Rotation 2) Deform 3) Scribble 4) SunFlare 5) DirtySpot 6) Low Ink
Implementation:
Subtasks:
[x] Modify Synthetic Page Generation Scripts:
[ ] Modify Line Extraction Script (extraction_line.py):
Initial Estimation start date: 29/08/24 end date: 3/09/24
Update Estimation Start date: 4/09/24 End date: 5/09/24