BUG: Encountered Operation not supported OSError when running MNIST Torch example #233

Open hwpang opened 4 months ago

hwpang commented 4 months ago

What are you trying to do?

I am a new user to SubstraFL and am currently going through the example at

Issue Description (what is happening?)

The notebook failed at the following cell with an OSError.

from substrafl.experiment import execute_experiment
import logging
import substrafl

# A round is defined by a local training step followed by an aggregation operation

compute_plan = execute_experiment(
    experiment_folder=str(pathlib.Path.cwd() / "tmp" / "experiment_summaries"),
    name="MNIST documentation example",

Expected Behavior (what should happen?)

Expected to not have the error when running the tutorial.

Reproducible Example

Operating system

Ubuntu 20.04

Python version


Installed Substra versions


Installed versions of dependencies

Logs / Stacktrace

Rounds progress: 100%|██████████| 3/3 [00:00<00:00, 1050.24it/s]
Compute plan progress:  10%|▉         | 2/21 [02:35<24:34, 77.61s/it]
OSError                                   Traceback (most recent call last)
Cell In[14], line 9
      6 # A round is defined by a local training step followed by an aggregation operation
      7 NUM_ROUNDS = 3
----> 9 compute_plan = execute_experiment(
     10     client=clients[ALGO_ORG_ID],
     11     strategy=strategy,
     12     train_data_nodes=train_data_nodes,
     13     evaluation_strategy=my_eval_strategy,
     14     aggregation_node=aggregation_node,
     15     num_rounds=NUM_ROUNDS,
     16     experiment_folder=str(pathlib.Path.cwd() / "tmp" / "experiment_summaries"),
     17     dependencies=dependencies,
     18     clean_models=False,
     19     name="MNIST documentation example",
     20 )

File ~/cloudfiles/code/Users/hpang/conda_envs/substrafl_env/lib/python3.11/site-packages/substrafl/, in execute_experiment(client, strategy, train_data_nodes, experiment_folder, num_rounds, aggregation_node, evaluation_strategy, dependencies, clean_models, name, additional_metadata, task_submission_batch_size)
    485 # save the experiment summary in experiment_folder
    486 _save_experiment_summary(
    487     experiment_folder=experiment_folder,
    488     compute_plan_key=compute_plan_key,
    496     additional_metadata=additional_metadata,
    497 )
--> 498 compute_plan = client.add_compute_plan(
    499     substra.sdk.schemas.ComputePlanSpec(
    500         key=compute_plan_key,
    501         tasks=tasks,
    502         name=name or timestamp,
    503         metadata=cp_metadata,
    504     ),
    505     auto_batching=True,
    506     batch_size=task_submission_batch_size,
    507 )
    508"The compute plan has been registered to Substra, its key is {0}.").format(compute_plan.key))
    509 return compute_plan

File ~/cloudfiles/code/Users/hpang/conda_envs/substrafl_env/lib/python3.11/site-packages/substra/sdk/, in logit.<locals>.wrapper(*args, **kwargs)
     46 error = None
     47 try:
---> 48     return f(*args, **kwargs)
     49 except Exception as e:
     50     error = e.__class__.__name__

File ~/cloudfiles/code/Users/hpang/conda_envs/substrafl_env/lib/python3.11/site-packages/substra/sdk/, in Client.add_compute_plan(self, data, auto_batching, batch_size)
    542 if not is_valid_uuid(spec.key):
    543     raise exceptions.ComputePlanKeyFormatError(
    544         "The compute plan key has to respect the UUID format. You can use the uuid library to generate it. \
    545     Example: compute_plan_key=str(uuid.uuid4())"
    546     )
--> 548 return self._backend.add(spec, spec_options=spec_options)

File ~/cloudfiles/code/Users/hpang/conda_envs/substrafl_env/lib/python3.11/site-packages/substra/sdk/backends/local/, in Local.add(self, spec, spec_options, key)
    485 else:
    486     if spec.__class__.type_ == schemas.Type.ComputePlan:
--> 487         compute_plan = add_asset(spec, spec_options)
    488         return compute_plan
    489     else:

File ~/cloudfiles/code/Users/hpang/conda_envs/substrafl_env/lib/python3.11/site-packages/substra/sdk/backends/local/, in Local._add_compute_plan(self, spec, spec_options)
    403 compute_plan = self._db.add(compute_plan)
    405 # go through the tasks sorted by rank
--> 406 compute_plan = self.__execute_compute_plan(spec, compute_plan, visited, tasks, spec_options)
    407 return compute_plan

File ~/cloudfiles/code/Users/hpang/conda_envs/substrafl_env/lib/python3.11/site-packages/substra/sdk/backends/local/, in Local.__execute_compute_plan(self, spec, compute_plan, visited, tasks, spec_options)
    266         if not task_spec:
    267             continue
--> 269         self.add(
    270             key=task_spec.key,
    271             spec=task_spec,
    272             spec_options=spec_options,
    273         )
    275         progress_bar.update()
    277 return compute_plan

File ~/cloudfiles/code/Users/hpang/conda_envs/substrafl_env/lib/python3.11/site-packages/substra/sdk/backends/local/, in Local.add(self, spec, spec_options, key)
    489 else:
    490     key = key or spec.compute_key()
--> 491     add_asset(key, spec, spec_options)
    492     return key

File ~/cloudfiles/code/Users/hpang/conda_envs/substrafl_env/lib/python3.11/site-packages/substra/sdk/backends/local/, in Local._add_task(self, key, spec, spec_options)
    420 task = models.Task(
    421     key=key,
    422     creation_date=self.__now(),
    433     metadata=spec.metadata if spec.metadata else dict(),
    434 )
    436 task = self._db.add(task)
--> 437 self._worker.schedule_task(task)
    438 return task

File ~/cloudfiles/code/Users/hpang/conda_envs/substrafl_env/lib/python3.11/site-packages/substra/sdk/backends/local/compute/, in Worker.schedule_task(self, task)
    310 elif asset_type == schemas.Type.Dataset:
    311     dataset = self._db.get_with_files(schemas.Type.Dataset, task_input.asset_key)
    312     cmd_line_inputs.append(
--> 313         self._prepare_dataset_input(
    314             dataset=dataset,
    315             task_input=task_input,
    316             input_volume=volumes[VOLUME_INPUTS],
    317             multiple=multiple,
    318         )
    319     )
    320     addable_asset = dataset
    322 if addable_asset:

File ~/cloudfiles/code/Users/hpang/conda_envs/substrafl_env/lib/python3.11/site-packages/substra/sdk/backends/local/compute/, in Worker._prepare_dataset_input(self, dataset, task_input, input_volume, multiple)
    157 def _prepare_dataset_input(
    158     self, dataset: models.Dataset, task_input: models.InputRef, input_volume: str, multiple: bool
    159 ):
    160     path_to_opener = input_volume / Filenames.OPENER.value
--> 161     Path(dataset.opener.storage_address).link_to(path_to_opener)
    162     return TaskResource(
    163         id=task_input.identifier,
    164         value=f"{TPL_VOLUME_INPUTS}/{Filenames.OPENER.value}",
    165         multiple=multiple,
    166     )

File ~/cloudfiles/code/Users/hpang/conda_envs/substrafl_env/lib/python3.11/, in Path.link_to(self, target)
   1211 """
   1212 Make the target path a hard link pointing to this path.
   1220 Use `hardlink_to()` instead.
   1221 """
   1222 warnings.warn("pathlib.Path.link_to() is deprecated and is scheduled "
   1223               "for removal in Python 3.12. "
   1224               "Use pathlib.Path.hardlink_to() instead.",
   1225               DeprecationWarning, stacklevel=2)
-> 1226 self.__class__(target).hardlink_to(self)

File ~/cloudfiles/code/Users/hpang/conda_envs/substrafl_env/lib/python3.11/, in Path.hardlink_to(self, target)
   1206 if not hasattr(os, "link"):
   1207     raise NotImplementedError(" not available on this system")
-> 1208, self)

OSError: [Errno 95] Operation not supported: '/mnt/batch/tasks/shared/LS_root/mounts/clusters/hpang8/code/Users/hpang/Projects/Federated_learning/substrafl/local-worker/yumnknd_/61c0f7fa-5228-4804-9d24-8beac24bfbc2/' -> '/mnt/batch/tasks/shared/LS_root/mounts/clusters/hpang8/code/Users/hpang/Projects/Federated_learning/substrafl/local-worker/d18aa0b7-4aaf-4a4d-9e87-ebead4d168f9/inputs/'
SdgJlbl commented 4 months ago

Thanks a lot for raising this issue. We were aware that the way of handling paths had changed in 3.12, but I didn't know that it could affect Python versions before that. We will look into it.

KindEmily commented 3 months ago

Hey @hwpang

I`m currently also facing an issue with this tutorial

Would appreciate any help if you`re managed to finish that tutorial

Contact me pls 👋

P.s. I'm also active on Substra slack channel, you're very welcomed to come say hi and share your current progress I`d be happy to have a contact with anyone I can discuss the potential problems solutions

You can find the Slack channel invite in the Substra community URL:

Help me pls 🆘

And if you would like to check on my issue, please take a look at the Run-experiment-console-error-help-request branch URL:


KindEmily commented 3 months ago

@SdgJlbl Kindly asking if you managed to check on this ? 🥺

KindEmily commented 3 months ago

@hwpang I was able to finish the tutorial by using flat structure instead of modules (putting all the code in a single file e.g.