I ran into an os.error (Too many open files) while parsing files on a debian machine :
File "/home/blablah/.cache/pypoetry/virtualenvs/adoc-H6ni72Z6-py3.10/lib/python3.10/site-packages/newspaper/article.py", line 214, in parse
File "/home/blablah/.cache/pypoetry/virtualenvs/adoc-H6ni72Z6-py3.10/lib/python3.10/site-packages/newspaper/article.py", line 495, in set_meta_language
File "/home/blablah/.cache/pypoetry/virtualenvs/adoc-H6ni72Z6-py3.10/lib/python3.10/site-packages/newspaper/utils.py", line 346, in get_available_languages
OSError: [Errno 24] Too many open files: '/home/blablah/.cache/pypoetry/virtualenvs/adoc-H6ni72Z6-py3.10/lib/python3.10/site-packages/newspaper/resources/text'
Using help on os.listdir returns : "[...] On some platforms, path may also be specified as an open file descriptor; the file descriptor must refer to a directory. [...]"
The documentation also states that "[t]he scandir() function returns directory entries along with file attribute information, giving better performance for many common use cases.".
I ran a benchmark with timeit between the old code and the new one ; they are pretty much the same (ie. 67.3 µs ± 15.6 µs and 64.2 µs ± 12.8 µs resp.).
I'm not totally sure why os is trying to open files while using the listdir command, but switching to scandir should fix that behaviour (according to the documentation...).
to fix the exception.
Well, I should have tested this beforehand.
My issue (ie OSError: [Errno 24] Too many open files) is still happening and scandir doesn't change a thing.
I'll close my PR for now.
I ran into an os.error (Too many open files) while parsing files on a debian machine :
Using
help
on os.listdir returns : "[...] On some platforms, path may also be specified as an open file descriptor; the file descriptor must refer to a directory. [...]"The documentation also states that "[t]he scandir() function returns directory entries along with file attribute information, giving better performance for many common use cases.".
I ran a benchmark with timeit between the old code and the new one ; they are pretty much the same (ie. 67.3 µs ± 15.6 µs and 64.2 µs ± 12.8 µs resp.).
I'm not totally sure why os is trying to open files while using the listdir command, but switching to scandir should fix that behaviour (according to the documentation...). to fix the exception.