Open jim-gyas opened 3 weeks ago
@ta4tsering and @kaldan007 , In Transkribus, there are several data collections available. Based on my review, here are my findings:
The collections marked with a check (✓) are confirmed and will be taken for parsing. The collections marked with a circle (○) are under consideration due to doubts about the language suitability. The collections marked with a cross (✗) cannot be accessed, as they are returning a 404 error. Please see the attached screenshot for a visual reference of the collections.
@jim-gyas u can ignore all the confuse except derge-kangyur
`<?xml version="1.0" ?>
`
{"id": "Correction-7_IMG_4305.jpg", "image": "https://s3.amazonaws.com/monlam.ai.ocr/line_segmentations/Images/Correction-7_IMG_4305.jpg", "spans": [{"id": "b6b09c07-5c1a-495f-86d0-7d2b7b8b7284", "height": 5, "width": 106, "center": [754.0, 952.5], "points": [[701, 950], [701, 955], [807, 955], [807, 950]]}, {"id": "d4b3469e-76ab-4d44-a4be-5b3bfa007b36", "height": -120, "width": 217, "center": [791.5, 1107.0], "points": [[683, 1167], [683, 1047], [900, 1047], [900, 1167]]}, {"id": "1f8387f9-f213-42c9-bae3-3c77cc318128", "height": -185, "width": 295, "center": [844.5, 1416.5], "points": [[697, 1509], [697, 1324], [992, 1324], [992, 1509]]}, {"id": "6a1fcfa2-6f9c-4ff7-b4d1-750c6094e421", "height": 9, "width": 97, "center": [1317.5, 3026.5], "points": [[1269, 3022], [1269, 3031], [1366, 3031], [1366, 3022]]}]}
currently working on Transkribus data, will be done tomorrow.
Description: The current Training data we have for the line segmentation is too less so we need to gather around more training data for the line segmentation from our existing data.
Resources:
Related Card: Google Books data creation
Completion Criteria: All the training data for the line segmentation in a single format. Possible formats
XML format
Implementation plan
Sub Task