JaidedAI / EasyOCR

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
https://www.jaided.ai
Apache License 2.0
23.96k stars 3.13k forks source link

Easyocr memory leak #815

Open arpithajanney opened 2 years ago

arpithajanney commented 2 years ago

Everytime when I call the reader.readertxt function memory will be increasing in both GPU and CPU.Need some inputs from anyone

rkcosmos commented 2 years ago

once in a while you can do gc.collect() for CPU memory cleanup torch.cuda.empty_cache() for GPU memory cleanup

arpithajanney commented 2 years ago

We tried with both the commands but this is not working...In the GPU also memory leak will happen? Any other methods do you suggest like downgrading versions or any other alternatives

macksjeremy commented 2 years ago

Check to make sure you're not using different sized images as input. Since the model is fully convolutional, it tries to make spaces for as large of an input as possible. See the image where the memory leak occurs and make sure it's not too large.

arpithajanney commented 2 years ago

Image is not too large , I tried uploading the same image for multiple times in that case also memory is increasing and also i have checked memory footprint, reader.readertext() function is occupying memory in each call

BBO-repo commented 2 years ago

I observed the same behavior, ie. calling reader.readertxt function memory will be increasing in CPU. Adding gc.collect() does not help. I've not checked with GPU so far.

BasilDavid commented 2 years ago

any advice on this issue? I already tried gc.collect() with no luck. CPU only , never tried GPU

rokopi-byte commented 1 year ago

Hi, any news on this ? that's a very big issue.. I observe memory leak in CPU. Even using gc.collect() before every call to reader.readertxt after a while you go OOM. Restarting the script is not an option if you are deploying an API. Must be another option to free memory @rkcosmos .

idengaurav commented 1 year ago

Observing this issue in CPU as well. Memory keeps increasing until it reaches limit in k8s and the service gets Killed.

Manhal-Munir-Al-khayat commented 1 year ago

I’m facing the same problem. Any updates on this ? i tried to find the mem leak by using memory_profiler in my python program:

`

    Line #    Mem usage    Increment  Occurrences   Line Contents
    =============================================================
       308    962.8 MiB    962.8 MiB           1       @profile
       309                                             def detect(self, img, min_size = 20, text_threshold = 0.7, low_text = 0.4,\
       310                                                        link_threshold = 0.4,canvas_size = 2560, mag_ratio = 1.,\
       311                                                        slope_ths = 0.1, ycenter_ths = 0.5, height_ths = 0.5,\
       312                                                        width_ths = 0.5, add_margin = 0.1, reformat=True, optimal_num_chars=None,
       313                                                        threshold = 0.2, bbox_min_score = 0.2, bbox_min_size = 3, max_candidates = 0,
       314                                                        ):
       315                                         
       316    962.8 MiB      0.0 MiB           1           if reformat:
       317                                                     img, img_cv_grey = reformat_input(img)
       318                                         
       319   2265.1 MiB   1302.3 MiB           2           text_box_list = self.get_textbox(self.detector, 
       320    962.8 MiB      0.0 MiB           1                                       img, 
       321    962.8 MiB      0.0 MiB           1                                       canvas_size = canvas_size, 
       322    962.8 MiB      0.0 MiB           1                                       mag_ratio = mag_ratio,
       323    962.8 MiB      0.0 MiB           1                                       text_threshold = text_threshold, 
       324    962.8 MiB      0.0 MiB           1                                       link_threshold = link_threshold, 
       325    962.8 MiB      0.0 MiB           1                                       low_text = low_text,
       326    962.8 MiB      0.0 MiB           1                                       poly = False, 
       327    962.8 MiB      0.0 MiB           1                                       device = self.device, 
       328    962.8 MiB      0.0 MiB           1                                       optimal_num_chars = optimal_num_chars,
       329    962.8 MiB      0.0 MiB           1                                       threshold = threshold, 
       330    962.8 MiB      0.0 MiB           1                                       bbox_min_score = bbox_min_score, 
       331    962.8 MiB      0.0 MiB           1                                       bbox_min_size = bbox_min_size, 
       332    962.8 MiB      0.0 MiB           1                                       max_candidates = max_candidates,
       333                                                                             )
       334                                         
       335   2265.1 MiB      0.0 MiB           1           horizontal_list_agg, free_list_agg = [], []
       336   2265.1 MiB      0.0 MiB           2           for text_box in text_box_list:
       337   2265.1 MiB      0.0 MiB           2               horizontal_list, free_list = group_text_box(text_box, slope_ths,
       338   2265.1 MiB      0.0 MiB           1                                                           ycenter_ths, height_ths,
       339   2265.1 MiB      0.0 MiB           1                                                           width_ths, add_margin,
       340   2265.1 MiB      0.0 MiB           1                                                           (optimal_num_chars is None))
       341   2265.1 MiB      0.0 MiB           1               if min_size:
       342   2265.1 MiB      0.0 MiB          21                   horizontal_list = [i for i in horizontal_list if max(
       343   2265.1 MiB      0.0 MiB          12                       i[1] - i[0], i[3] - i[2]) > min_size]
       344   2265.1 MiB      0.0 MiB           3                   free_list = [i for i in free_list if max(
       345                                                             diff([c[0] for c in i]), diff([c[1] for c in i])) > min_size]
       346   2265.1 MiB      0.0 MiB           1               horizontal_list_agg.append(horizontal_list)
       347   2265.1 MiB      0.0 MiB           1               free_list_agg.append(free_list)
       348                                         
       349   2265.1 MiB      0.0 MiB           1           return horizontal_list_agg, free_list_agg

`

when calling the .readtext() `

  Line #    Mem usage    Increment  Occurrences   Line Contents
  =============================================================
      20    949.0 MiB    949.0 MiB           1   @profile
      21                                         def get_image(sqs_body: SQSSuggestionBase):
      22    950.1 MiB      1.1 MiB           1       response = requests.get(sqs_body['previewUrl'])
      23    950.1 MiB      0.0 MiB           1       if response:
      24    950.1 MiB      0.0 MiB           1           if response.status_code == 200:
      25    950.2 MiB      0.2 MiB           1               img = read_image(response.content)
      26   2273.4 MiB   1323.2 MiB           1               textInImage = tiiModel.readtext(img, detail=1, paragraph=False, blocklist='=-+_()/!"$%&?`^§[]') #0,45,315, , text_threshold=0.60, width_ths=0.7,threshold=0.60,link_threshold=0.60) # , allowlist = '0123456789' , batch_size=5, decoder='greedy' ,min_size=50
      27                                                     #imgH, imgW, channels = img.shape
      29   2273.4 MiB      0.0 MiB           1               imgW, imgH = img.size
      ...

`

and it keep increasing every time the method is called.

ash2703 commented 1 year ago

Is there any update on this? After extensive profiling also its hard to pinpoint the exact line where the issue is happening. Doing this in v1.6.2

Since tools like tracemalloc and objgraph are not able to show any leaks, it's possible this is happening in c++ side of things which are being called by pytorch or numpy wrappers

AndyWatterman commented 1 year ago

We could confirm this problem. In our case each call to "readertxt" adds 33mb leak.

rokopi-byte commented 1 year ago

At the end the only way I found to solve the problem is to do everything in a multiprocessing process. Not a very clean solution, it adds a little overhead, but not other solution was available..

davebelle85 commented 6 months ago

For those like me who are only using a CPU and facing this issue, make sure you add the gpu=False flag. Memory leak was fixed for me after adding it.

reader = easyocr.Reader(['en'], gpu=False)

daniellovera commented 2 months ago

This fix worked for me, might work for others.

https://github.com/JaidedAI/EasyOCR/pull/1278