MaartenGr / Concept

Concept Modeling: Topic Modeling on Images and Text
https://maartengr.github.io/Concept/
MIT License
192 stars 16 forks source link

OSError: [Errno 24] Too many open files: 'photos/icnZ2R8PcDs.jpg' #11

Open dbl001 opened 2 years ago

dbl001 commented 2 years ago

What do recommend setting max_open_files to?

images = [Image.open("photos/"+filepath) for filepath in tqdm(img_names[:5000])]
image_names = img_names[:5000]
image_embeddings = img_embeddings[:5000]

54%|███████████████████▍                | 2693/5000 [00:00<00:00, 13545.87it/s]
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
Input In [4], in <cell line: 1>()
----> 1 images = [Image.open("photos/"+filepath) for filepath in tqdm(img_names[:5000])]
      2 image_names = img_names[:5000]
      3 image_embeddings = img_embeddings[:5000]

Input In [4], in <listcomp>(.0)
----> 1 images = [Image.open("photos/"+filepath) for filepath in tqdm(img_names[:5000])]
      2 image_names = img_names[:5000]
      3 image_embeddings = img_embeddings[:5000]

File ~/tensorflow-metal/lib/python3.8/site-packages/PIL/Image.py:2968, in open(fp, mode, formats)
   2965     filename = fp
   2967 if filename:
-> 2968     fp = builtins.open(filename, "rb")
   2969     exclusive_fp = True
   2971 try:

OSError: [Errno 24] Too many open files: 'photos/icnZ2R8PcDs.jpg'

% ulimit -a
-t: cpu time (seconds)              unlimited
-f: file size (blocks)              unlimited
-d: data seg size (kbytes)          unlimited
-s: stack size (kbytes)             8192
-c: core file size (blocks)         0
-v: address space (kbytes)          unlimited
-l: locked-in-memory size (kbytes)  unlimited
-u: processes                       11136
-n: file descriptors                8192
(base) davidlaxer@x86_64-apple-darwin13 notebooks % 
MaartenGr commented 2 years ago

What do recommend setting max_open_files to?

I am not entirely sure what you are referring to. What do you mean with max_open_files?

Checking the code that you posted it seems that you are taking a subset of 5000 images and trying to open them all at once. In Concept, you only need to give the path to the images and not the images themselves as it will indeed result in many issues with holding all those images in memory. I would highly recommend batching them instead as is being done by Concept:

https://github.com/MaartenGr/Concept/blob/d270607d6ea4d789a42d54880ab4a0c977bb69ce/concept/_model.py#L197-L231

dbl001 commented 2 years ago

Agreed.

I was running the code example in this notebook:

https://github.com/MaartenGr/Concept/blob/main/notebooks/Concept.ipynb

On May 9, 2022, at 10:42 PM, Maarten Grootendorst @.***> wrote:

 What do recommend setting max_open_files to?

I am not entirely sure what you are referring to. What do you mean with max_open_files?

Checking the code that you posted it seems that you are taking a subset of 5000 images and trying to open them all at once. In Concept, you only need to give the path to the images and not the images themselves as it will indeed result in many issues with holding all those images in memory. I would highly recommend batching them instead as is being done by Concept:

https://github.com/MaartenGr/Concept/blob/d270607d6ea4d789a42d54880ab4a0c977bb69ce/concept/_model.py#L197-L231

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.

MaartenGr commented 2 years ago

Ah okay! I'll make sure to update the notebook in the next release.