WongKinYiu / yolov7

Implementation of paper - YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
GNU General Public License v3.0
13.18k stars 4.17k forks source link

CombiningYOLOv7 model with OCR #1421

Open ghost opened 1 year ago

ghost commented 1 year ago

Hi, everyone.

Is there a way of combining trained YOLOv7 model with OCR to recognize text automatically, in flow? And how?

pauliustumas commented 1 year ago

Hi,

one approach to detecting individual letters in an image would be to train a object detection model on a custom dataset of labeled images, where each letter is treated as a separate class. Once the model is trained, it can be used to detect the presence and location of each letter in an input image. The output of this step would be a sequence of detected letters. To match this sequence of letters to individual words, a dictionary lookup could be performed to identify possible word matches.

Hope it helps.

StefanCiobanu1989 commented 1 year ago

Text from where ? Everything, car number plates ? if it's something like plates you can train a model to detect plates then with each detection you also take a screenshot using the bounding box location on screen as reference so you only screenshot the car plate not the whole image, after that run whatever OCR algorithm you use on that image or bitmap to get your text.

faizan1234567 commented 1 year ago

@aezakmi99 you can use yolov7 for localizing objects, so yes if you trained it on digits or alphabets dataset. It will be detect them. You can then crop those digits for further processing.

ghost commented 1 year ago

@pauliustumas @faizan1234567 @StefanCiobanu1989 i want to detect text from car number plates. I made an yolov7 algo with high precision, and it works fine. Now I was wondering how to combine OCR with detection of plates, once when i run command for detecting, it also shows text from plates. I want it all to be one process, not first taking screenshots or smth. I just dont know how do to it.

faizan1234567 commented 1 year ago

Here is a pipe line you can try: digits detector->digits classifier->label-text-from-prediction

faizan1234567 commented 1 year ago

maybe you can train detector on a large digits dataset to detect and label digits. As for as YOLOv7, you can just use it to detect an object for instance a number plate. You have to read literature to form your solution. I hope it helps

StefanCiobanu1989 commented 1 year ago

@pauliustumas @faizan1234567 @StefanCiobanu1989 i want to detect text from car number plates. I made an yolov7 algo with high precision, and it works fine. Now I was wondering how to combine OCR with detection of plates, once when i run command for detecting, it also shows text from plates. I want it all to be one process, not first taking screenshots or smth. I just dont know how do to it.

uhm, it all can be done in the same process by passing the inferenced image or frame or image section if you want if you are doing it in real time. Having it done with CV alone might get tricky because the detections wont happen in the order they are written on the plate. For instance i have a XSD234 the detector will tell you it found all of the letters and numbers but not the order in which you have to put them, meaning you will have to find a way around that but i might be wrong. However if you use an ocr library and pass the image/frame then that runs it's own image to text conversion would be much easier. Give something like https://pypi.org/project/pytesseract/ a look .

ghost commented 1 year ago

I made yolo v7 model for detecting one class only. Also, i added ocr so when it detects the object, it crops detection from bounding boxes and it send that to ocr for text detection. But, when plotting the results, detected text is too large and it goes beyond image. I tried edditing the def plot_one_box function, but with no success.

Here is how my function looks now:

def plot_one_box(x, img, color=None, label=None, text=None, line_thickness=3): tl = linethickness or round(0.002 * (img.shape[0] + img.shape[1]) / 2) + 1 # line/font thickness color = color or [random.randint(0, 255) for in range(3)] c1, c2 = (int(x[0]), int(x[1])), (int(x[2]), int(x[3])) cv2.rectangle(img, c1, c2, color, thickness=tl, lineType=cv2.LINE_AA) if label: tf = max(tl - 1, 1) # font thickness t_size = cv2.getTextSize(label, 0, fontScale=tl / 3, thickness=tf)[0] c2 = c1[0] + t_size[0], c1[1] - t_size[1] - 3 cv2.rectangle(img, c1, c2, color, -1, cv2.LINE_AA) # filled cv2.putText(img, label, (c1[0], c1[1] - 2), 0, tl / 3, [225, 255, 255], thickness=tf, lineType=cv2.LINE_AA) if text: t_size = cv2.getTextSize(text, 0, fontScale=tl / 3, thickness=1)[0] c3 = c1[0], c1[1] - t_size[1] - 3 cv2.rectangle(img, c1, c3, color, -1, cv2.LINE_AA) # filled cv2.putText(img, text, (c1[0], c1[1] + t_size[1] + 2), 0, tl / 3, [255, 255, 255], thickness=1, lineType=cv2.LINE_AA)

Anyway, I want to print detected label above bounding box as it is now, and then the detected text, but when it reaches the width of the bounding box width, it goes into a new line with new filled rectangle. Can somebody help me.

DinisDimitris commented 6 months ago

https://github.com/DinisDimitris/yolov7OCR