Closed chenmo7760 closed 2 weeks ago
Hello, thank you for the question. Yes, the algorithm is complete. You should start from here: https://github.com/SFStefenon/Digital_ED/tree/main/YOLO. Take a look at the topic Create a Custom Dataset, over there you can find the explanation about how the dataset is labeled. Probably you are not having an outcome because you are loading the dataset. Regards;
PS: If you want to give a try on computing object detection, I recommend to use the Eyad Elyan's dataset: https://github.com/heyad/Eng_Diagrams
Thank you for the reply, Dr. I previously used YOLOv8 to recognize meter readings. Here, following your README, I first ran Sliding Window Compute.py
, and then when I executed yolov8-optuna-sd2.py
, I noticed that there is no my_rfi.yaml
file. Could this be provided? In here, can we see the annotated classes in the dataset? Additionally, in this project, lines do not need to be annotated for YOLOv8 to recognize them, right?
In the YOLO folder, there are only Change_All_the_Classes.py, Change_the_Classes_to_0.py, and Check Class.py. Is the inference script for the trained model located here?
Thank you for the reply, Dr. I previously used YOLOv8 to recognize meter readings. Here, following your README, I first ran
Sliding Window Compute.py
, and then when I executedyolov8-optuna-sd2.py
, I noticed that there is nomy_rfi.yaml
file. Could this be provided? In here, can we see the annotated classes in the dataset? Additionally, in this project, lines do not need to be annotated for YOLOv8 to recognize them, right?
Yes, follows an example of the my_rfi.yaml
file here: https://github.com/SFStefenon/Digital_ED/blob/main/YOLO/my_rfi.yaml
This file depends of your annotations classes. You can find more information about that over here (in the section Organize Your Dataset): https://github.com/SFStefenon/Digital_ED/tree/main/YOLO
An example of the annotated classes is here (my dataset is not publicly available):
Lines don't need to be annotated, they are detected by PHT as explained here: https://github.com/SFStefenon/Digital_ED/tree/main/PHT-DBSCAN
Regards;
In the YOLO folder, there are only Change_All_the_Classes.py, Change_the_Classes_to_0.py, and Check Class.py. Is the inference script for the trained model located here?
These files are used only if you want to change the classes, if not, they can be disregarded.
Hello Dr. SFStefenon, I have carefully read the README file, but I could not find information on the process for subsequent recognition of annotations on the diagrams, such as a transformer nameed "XX-1". If such information is missing from the generated diagrams, it would pose a certain difficulty for the users. If this has been considered in the project, could you please inform me? Additionally, may I ask if there are any statistics or estimates regarding the final recognition accuracy of your entire solution? Thank you.
Hello Dr. SFStefenon, after reading your article, I have a few questions. Before training the yolo model, do the images that are input into the model need to be captured using a sliding window? If we take screenshots at fixed sizes, wouldn't this cause the annotation information to be separated? Or am I misunderstanding something
Hello Dr. SFStefenon,May I ask how the recognized text from "5.6.2. Text Recognition" is processed, and how the recognized results are placed back into the original position of the graphic elements? Could you share the code for this part? Thank you.
Hello Dr. SFStefenon, I have carefully read the README file, but I could not find information on the process for subsequent recognition of annotations on the diagrams, such as a transformer nameed "XX-1". If such information is missing from the generated diagrams, it would pose a certain difficulty for the users. If this has been considered in the project, could you please inform me? Additionally, may I ask if there are any statistics or estimates regarding the final recognition accuracy of your entire solution? Thank you.
Hello, the recognition of the annotation is performed by the YOLO, we can't provide our dataset for your analysis, but we encourage you to try with your own data, following the steps that we presented. The explanation to compute the YOLO is here: https://github.com/SFStefenon/Digital_ED/tree/main/YOLO
Regarding the statistics, in this explanation you will find:
for runs in range(0,10):
This is how you can initialize the model with different seeds and then perform statistics analysis.
Thank you.
Hello Dr. SFStefenon, after reading your article, I have a few questions. Before training the yolo model, do the images that are input into the model need to be captured using a sliding window? If we take screenshots at fixed sizes, wouldn't this cause the annotation information to be separated? Or am I misunderstanding something
Hello, the slide window is used because the images that we considered were too large. Using this approach we can have a sub-dataset with images of 640 by 640 pixels. If you already have these sizes you don't need to use the slide window method. The annotation is considered only after the sliding window, therefore we don't have concerns about that. Thank you.
Hello Dr. SFStefenon,May I ask how the recognized text from "5.6.2. Text Recognition" is processed, and how the recognized results are placed back into the original position of the graphic elements? Could you share the code for this part? Thank you.
Hello, Text Recognition presented in section 5.6.2. is a comparison to other approaches. Your concern is right since Tesseract does not provide the position of the text and LLaVA didn't work properly to do so. To solve this issue we used YOLO for text recognition, providing the class of each letter of the label and its position. Thank you.
Should I first use sliding window segmentation and then proceed with labeling? My target image size is 2600*1200 pixels. I'm not sure if I need to segment it.
---Original--- From: "Stefano F. @.> Date: Mon, Nov 11, 2024 00:40 AM To: @.>; Cc: @.**@.>; Subject: Re: [SFStefenon/Digital_ED] Regarding the code execution issue (Issue#1)
Hello Dr. SFStefenon, after reading your article, I have a few questions. Before training the yolo model, do the images that are input into the model need to be captured using a sliding window? If we take screenshots at fixed sizes, wouldn't this cause the annotation information to be separated? Or am I misunderstanding something
Hello, the slide window is used because the images that we considered were too large. Using this approach we can have a sub-dataset with images of 640 by 640 pixels. If you already have these sizes you don't need to use the slide window method. The annotation is considered only after the sliding window, therefore we don't have concerns about that. Thank you.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
You mean training by labeling the text as a single category and then using inference for recognition? So, do we need to determine the position of the text using rules similar to processing line segments, or can we directly display the positions recognized by YOLO? Can we also further structure the relationships between the recognized graphical elements and text? Thank you very much!
---Original--- From: "Stefano F. @.> Date: Mon, Nov 11, 2024 00:45 AM To: @.>; Cc: @.**@.>; Subject: Re: [SFStefenon/Digital_ED] Regarding the code execution issue (Issue#1)
Hello Dr. SFStefenon,May I ask how the recognized text from "5.6.2. Text Recognition" is processed, and how the recognized results are placed back into the original position of the graphic elements? Could you share the code for this part? Thank you.
Hello, Text Recognition presented in section 5.6.2. is a comparison to other approaches. Your concern is right since Tesseract does not provide the position of the text and LLaVA didn't work properly to do so. To solve this issue we used YOLO for text recognition, providing the class of each letter of the label and its position. Thank you.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
Should I first use sliding window segmentation and then proceed with labeling? My target image size is 2600*1200 pixels. I'm not sure if I need to segment it. … ---Original--- From: "Stefano F. @.> Date: Mon, Nov 11, 2024 00:40 AM To: @.>; Cc: @.**@.>; Subject: Re: [SFStefenon/Digital_ED] Regarding the code execution issue (Issue#1) Hello Dr. SFStefenon, after reading your article, I have a few questions. Before training the yolo model, do the images that are input into the model need to be captured using a sliding window? If we take screenshots at fixed sizes, wouldn't this cause the annotation information to be separated? Or am I misunderstanding something Hello, the slide window is used because the images that we considered were too large. Using this approach we can have a sub-dataset with images of 640 by 640 pixels. If you already have these sizes you don't need to use the slide window method. The annotation is considered only after the sliding window, therefore we don't have concerns about that. Thank you. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
Exactly, to build the dataset you must do this, considering that you have large images. You don't need to segment, just create a subdataset and annotate the images.
You mean training by labeling the text as a single category and then using inference for recognition? So, do we need to determine the position of the text using rules similar to processing line segments, or can we directly display the positions recognized by YOLO? Can we also further structure the relationships between the recognized graphical elements and text? Thank you very much! … ---Original--- From: "Stefano F. @.> Date: Mon, Nov 11, 2024 00:45 AM To: @.>; Cc: @.**@.>; Subject: Re: [SFStefenon/Digital_ED] Regarding the code execution issue (Issue#1) Hello Dr. SFStefenon,May I ask how the recognized text from "5.6.2. Text Recognition" is processed, and how the recognized results are placed back into the original position of the graphic elements? Could you share the code for this part? Thank you. Hello, Text Recognition presented in section 5.6.2. is a comparison to other approaches. Your concern is right since Tesseract does not provide the position of the text and LLaVA didn't work properly to do so. To solve this issue we used YOLO for text recognition, providing the class of each letter of the label and its position. Thank you. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
I mean training considering the labeled text, but not as a single category. Each letter is a class from a to z and A to Z. The text detection is straightforward, just identified by YOLO. Yes, you can create relationships, this depends on your dataset, if there are possible relation between the text and the symbols or letter to letter is where you define your rules to create the graph.
Thank you, I understand. Could you please attach an image of the picture generated by your model (including lines and graphical elements) and the original image that was recognized? I would like to compare them, thank you very much!
hello and thank you for your code. However, I'm not sure if this is the complete code? Where should I start? How should I label the dataset? Several scripts have run but there was no output. Thank you very much.