datadreamer-dev / DataDreamer

DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models.   🤖💤
https://datadreamer.dev
MIT License
724 stars 39 forks source link

Too many open files in system #5

Closed preemware closed 4 months ago

preemware commented 4 months ago

When using the DataDreamer library to interact with Cohere, the system encounters an OSError related to exceeding the maximum number of open files.

    yield llm.format_prompt(
  File "/usr/local/lib/python3.10/dist-packages/datadreamer/llms/llm.py", line 231, in format_prompt
    required_token_count = self.final_count_tokens(construct_final_prompt([]))
  File "/usr/local/lib/python3.10/dist-packages/datadreamer/llms/llm.py", line 92, in final_count_tokens
    return self.count_tokens(value)
  File "/usr/local/lib/python3.10/dist-packages/ring/func/base.py", line 816, in __call__
    return self.run(self._rope.config.default_action, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/ring/func/base.py", line 671, in run
    return attr(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/ring/func/base.py", line 697, in impl_f
    return attr(self, *fargs, pargs=pargs)
  File "/usr/local/lib/python3.10/dist-packages/ring/func/sync.py", line 54, in get_or_update
    result = self.execute(wire, pargs=pargs)
  File "/usr/local/lib/python3.10/dist-packages/ring/func/base.py", line 380, in execute
    return wire.__func__(*pargs.args, **pargs.kwargs)
  File "/usr/local/lib/python3.10/dist-packages/datadreamer/llms/_litellm.py", line 146, in count_tokens
    return token_counter(
  File "/usr/local/lib/python3.10/dist-packages/litellm/utils.py", line 2851, in token_counter
    tokenizer_json = _select_tokenizer(model=model)
  File "/usr/local/lib/python3.10/dist-packages/litellm/utils.py", line 2582, in _select_tokenizer
    tokenizer = Tokenizer.from_pretrained("Cohere/command-nightly")
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 1201, in hf_hub_download
    os.makedirs(storage_folder, exist_ok=True)
  File "/usr/lib/python3.10/os.py", line 225, in makedirs
    mkdir(name, mode)
OSError: [Errno 23] Too many open files in system: '/root/.cache/huggingface/hub/models--Cohere--command-nightly'
AjayP13 commented 4 months ago

Hi, unfortunately, this looks like something to do with your system. Either check your ulimit or if that is fine, then check and see if your folder is one some kind of mounted file system like a S3 bucket, GCS bucket, or NFS, something like that might cause issues. Worst case, I would recommend trying on another machine.

See this post of someone else having a similar issue if it helps: https://discuss.huggingface.co/t/too-many-open-files-when-loading-common-voice/14182