Open 35C4n0r opened 4 months ago
cc: @GautamR-Samagra
hi @GautamR-Samagra I'd like to work on this problem. Please assign me this.
Greetings of the day Samagra-Development/ai-tools , I have started my work on improving NER issue. I have already prepared a code to detect phone number, email, time, rates and units and calculate the dates given as "next monday, agle somvar". If it's possible may I be assigned to this issue and get the access to the crop, seeds and pests datasets so i can proceed further with the issue.
On Fri, 15 Mar 2024, 09:00 Gautam, @.***> wrote:
Assigned #295 https://github.com/Samagra-Development/ai-tools/issues/295 to @basedsaksham https://github.com/basedsaksham.
— Reply to this email directly, view it on GitHub https://github.com/Samagra-Development/ai-tools/issues/295#event-12126237859, or unsubscribe https://github.com/notifications/unsubscribe-auth/A32ZWDMUMVUVDMB3B6N2SODYYJTPHAVCNFSM6AAAAABDIJZF4GVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJSGEZDMMRTG44DKOI . You are receiving this because you were assigned.Message ID: @.*** com>
Greetings of the day Samagra-Development/ai-tools , I have started my work on improving NER issue. I have already prepared a code to detect phone number, email, time, rates and units and calculate the dates given as "next monday, agle somvar". If it's possible may I be assigned to this issue and get the access to the crop, seeds and pests datasets so i can proceed further with the issue. … On Fri, 15 Mar 2024, 09:00 Gautam, @.> wrote: Assigned #295 <#295> to @basedsaksham https://github.com/basedsaksham. — Reply to this email directly, view it on GitHub <#295 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/A32ZWDMUMVUVDMB3B6N2SODYYJTPHAVCNFSM6AAAAABDIJZF4GVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJSGEZDMMRTG44DKOI . You are receiving this because you were assigned.Message ID: @. com>
You are probably referring to the wrong ticket here
I have got this as a result after extracting rows and columns using DETR. I will proceed to work on recognizing texts using OCR and pytesseract. Kindly let me know if this example output is satisfactory
hey @35C4n0r can you please explain what pytesseract settings and configs can be used to achieve the best output
Description
We have observed that the current implementation using Table Transformer is not achieving satisfactory performance in accurately detecting rows and columns within tables, particularly in the context of parsing Hindi tables from PDFs. To address this, we propose a new approach that integrates Detection Transformer (DETR) models with Pytesseract for improved detection of text objects within tables.
The objective is to develop a method where DETR models are used in conjunction with Pytesseract's OCR capabilities to enhance the accuracy of text detection and bounding box identification within table cells. This approach aims to provide a more robust solution for parsing tables by leveraging the strengths of both DETR models for object detection and Pytesseract for optical character recognition.
Proposed Workflow
Input
The input to the system will be PDFs or images containing tables, alongside the specification of the DETR model to be used and the language setting for Pytesseract. DETR Model Processing: Use the specified DETR model to detect text objects within the tables. DETR models, known for their efficiency in object detection tasks, will help identify text blocks or cells within the complex structure of tables. Pytesseract OCR: Apply Pytesseract with the specified language setting to the detected text objects to recognize the text within each cell.
Output Mapping
The output will be a structured mapping of each word detected to its corresponding location within the table (e.g., row1/column1/cell1/table1). This includes combining words that belong to the same cell or object for a comprehensive representation of the table's content.
Expected Outcome: