Open BSofo opened 4 years ago
I believe you'll need to update libfuse in the base docker image or otherwise install it in the environment you're using
@lostmygithubaccount thanks for jumping in! What's weird is that we've been using MML Spark's Docker image, microsoft/mmlspark:0.16
, as a base image for a PythonScriptStep
that reads from and writes toPipelineData
for about a year now. But now because we're (trying) to use OutputFileDatasetConfig
, this means our base docker image requires libfuse
? I thought that it was installed automatically as part of the run setup... cc: @MayMSFT
I see - so it sounds like PipelineData
did not require libfuse
, but the new OutputFileDatasetConfig
does. Will let @MayMSFT and data team comment
@rongduan-zhu
@swanderz the reason we need libfuse is that we mount inside the user's container. We are working on a new architecture to remove this restriction but until then, libfuse is needed in the docker image in order for mount to work.
Looking at the stack trace above, it looks like the input is also using dataset and the mode is set to mount. One way to avoid this issue is to use download mode and upload mode for inputs and ouputs respectively. The downside of this is if all your data doesn't fit onto disk, then it won't work and if the job doesn't complete successfully, no data will be uploaded.
Closing this issue #please-close.
reopen per new policy - is this fixed?
I recently upgraded my azureml-sdk to 1.16.0 and was running my PythonScriptStep roll.py spark dataset. But kept getting errors about libfuse.
4ef30c7f-cf30-48a3-9d0c-a0e6882958ef
0e36abc1-afe3-4e56-b316-d8617d65149c
70_driver_log.txt
Here is my spark-requirement.txt file:
packages in my local environment
And here is a snippet of my pipeline with the spark config and PythonScriptStep: