UKPLab / sentence-transformers

State-of-the-Art Text Embeddings
https://www.sbert.net
Apache License 2.0
15.45k stars 2.5k forks source link

RuntimeError: _share_filename_: only available on CPU #3014

Open msciancalepore98 opened 1 month ago

msciancalepore98 commented 1 month ago

Hi,

I am trying to run a pretty simple test, with the following args:

args = SentenceTransformerTrainingArguments(
        # Required parameter:
        output_dir=output_dir.as_posix(),
        # Optional training parameters:
        num_train_epochs=num_epochs,
        per_device_train_batch_size=batch_size,
        per_device_eval_batch_size=batch_size,
        learning_rate=1e-3,
        warmup_ratio=0.1,
        dataloader_num_workers=2,
        use_mps_device=False,
        eval_strategy="steps",
        eval_steps=100,
        save_strategy="steps",
        save_steps=100,
        save_total_limit=2,
        logging_steps=100,
        run_name="mpnet-base-all-nli-triplet",  # Will be used in W&B if `wandb` is installed
    )

but I got the following error:

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
    - Avoid using `tokenizers` before the fork if possible
    - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Traceback (most recent call last):
  File ".../lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File ".../lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File ".../projects/Modules/Training/pretrain-text-encoder.py", line 270, in <module>
    train_model(model, ds_train, ds_val, output_dir, num_epochs=3, batch_size=16)
  File ".../projects/Modules/Training/pretrain-text-encoder.py", line 217, in train_model
    trainer.train()
  File ".../lib/python3.10/site-packages/transformers/trainer.py", line 1938, in train
    return inner_training_loop(
  File ".../lib/python3.10/site-packages/transformers/trainer.py", line 2236, in _inner_training_loop
    for step, inputs in enumerate(epoch_iterator):
  File ".../lib/python3.10/site-packages/accelerate/data_loader.py", line 547, in __iter__
    dataloader_iter = self.base_dataloader.__iter__()
  File ".../lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 484, in __iter__
    return self._get_iterator()
  File ".../lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 415, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File ".../lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1138, in __init__
    w.start()
  File ".../lib/python3.10/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File ".../lib/python3.10/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File ".../lib/python3.10/multiprocessing/context.py", line 288, in _Popen
    return Popen(process_obj)
  File ".../lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File ".../lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File ".../lib/python3.10/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File ".../lib/python3.10/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
  File ".../lib/python3.10/site-packages/torch/multiprocessing/reductions.py", line 607, in reduce_storage
    metadata = storage._share_filename_cpu_()
  File ".../lib/python3.10/site-packages/torch/storage.py", line 437, in wrapper
    return fn(self, *args, **kwargs)
  File ".../lib/python3.10/site-packages/torch/storage.py", line 516, in _share_filename_cpu_
    return super()._share_filename_cpu_(*args, **kwargs)
RuntimeError: _share_filename_: only available on CPU

Of course, if I switch to num_workers=0, everything works.. use_mps_device True or False makes no difference (I am doing some tests locally).

If I try with the old .fit I have:

Traceback (most recent call last):
  File ".../autotagging/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File ".../autotagging/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File ".../Modules/Training/pretrain-text-encoder.py", line 227, in <module>
    train_model(model, ds_train, ds_val, output_dir, num_epochs=3, batch_size=16)
  File ".../Modules/Training/pretrain-text-encoder.py", line 175, in train_model
    model.fit(
  File ".../autotagging/lib/python3.10/site-packages/sentence_transformers/fit_mixin.py", line 260, in fit
    for batch in data_loader:
  File ".../autotagging/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 479, in __iter__
    self._iterator = self._get_iterator()
  File ".../autotagging/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 415, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File ".../autotagging/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1138, in __init__
    w.start()
  File ".../autotagging/lib/python3.10/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File ".../autotagging/lib/python3.10/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File ".../autotagging/lib/python3.10/multiprocessing/context.py", line 288, in _Popen
    return Popen(process_obj)
  File ".../autotagging/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File ".../autotagging/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File ".../autotagging/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File ".../autotagging/lib/python3.10/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'FitMixin.fit.<locals>.identity'

I have torch == 2.5.0 and sentence_transformers == 3.2.1.

msciancalepore98 commented 1 month ago

OK after some more digging I found this and it actually fixes it.

multiprocessing_context="fork" if torch.backends.mps.is_available() else None

Unfortunately I must use the old .fit to use the good old dataloader interface and setting this field !

tomaarsen commented 1 month ago

Hmm, that looks like a tricky bug. Could you perhaps use the new Trainer if you use:

class CustomTrainer(SentenceTransformerTrainer):
    def get_train_dataloader(self):
        dataloader = super().get_train_dataloader()
        dataloader.multiprocessing_context="fork" if torch.backends.mps.is_available() else None
        return dataloader

# also for eval/test dataloaders
msciancalepore98 commented 1 month ago

Yeah at the end I fixed it like that, would be worth to fix it mainstream or even updating the docs? Nowadays lots of folks in companies use macbooks to do dry runs locally before going cuda mode :)