Open harishukla93 opened 3 years ago
Spark.NET will look at your custom DLL using this environment variable; DOTNET_ASSEMBLY_SEARCH_PATHS So, just before spark-submit, you can set the environment variable targeting your dll folder:
set DOTNET_ASSEMBLY_SEARCH_PATHS=absolute_path_to_folder_containing_dlls
You can also copy these DLLs to Microsoft.Spark.Worker installation folder. (This what is perform on Databricks environment)
APP_DEPENDENCIES=/dbfs/apps/dependencies
WORKER_PATH=`readlink $DOTNET_SPARK_WORKER_INSTALLATION_PATH/Microsoft.Spark.Worker`
if [ -f $WORKER_PATH ] && [ -d $APP_DEPENDENCIES ]; then
sudo cp -fR $APP_DEPENDENCIES/. `dirname $WORKER_PATH`
fi
Thanks for the quick reply.
I copied in DOTNET_WORKER_DIR=/opt/Microsoft.Spark.Worker-1.0.0 but it didn't worked.
With above suggestion, I tried with adding another path
export DOTNET_ASSEMBLY_SEARCH_PATHS="/home/ubuntu/Downloads/NewDLLs"
It gives me one more path added in the error, but error still persist. I think this is something strange, something silly I am missing.
Error:
[Warn] [AssemblyLoader] Assembly 'Classes, Version=3.0.142.0, Culture=neutral, PublicKeyToken=910ab64095116ac0' file not found 'Classes[.dll,.ni.dll]' in '/home/ubuntu/Downloads/NewDLLs,/tmp/spark-fa9e5b80-6caa-420f-ad36-1a37f155ba7c/userFiles-3679bf84-7b62-4a29-98fd-218238f3276a,/home/ubuntu/project/mySparkApp/bin/Debug/net5.0,/opt/Microsoft.Spark.Worker-1.0.0/' [2021-04-13T12:10:24.8533078Z] [incs83-Vostro-3490] [Error] [TaskRunner] [1] ProcessStream() failed with exception: System.IO.FileNotFoundException: Could not load file or assembly 'Classes, Version=3.0.142.0, Culture=neutral, PublicKeyToken=910ab64095116ac0'. The system cannot find the file specified.
I also tried this Classes.dll and others in a normal c# project with mono to make sure the DLL is valid. It worked as expected.
Have you taken a look at https://docs.microsoft.com/en-us/dotnet/spark/how-to-guides/deploy-worker-udf-binaries ? It documents the necessary Environment variables to set up or the required spark configurations you can use.
Yes, this is what I followed. I am running with master=local with worker and UDF which is straightforward with this configuration.
This document has some parameter options which are available for yarn mode only. So I don't think I am missing anything from this document.
Looks like you are using .NET 5. Can you try recompiling your app using .NET Core 3.1 ?
Good point, I had issues using .NET 5, regarding System.Runtime for example. I fixed them by downgrading to .NET Core 3.1.
I have used this documentation for running my first app.: https://dotnet.microsoft.com/learn/data/spark-tutorial/install-dotnet
It redirects me to install .NET5. Anyways just installing .NET Core 3.1. I am getting a strong feeling that this will resolve the issue.
Tried with .NET core 3.1, no luck. This is really strange.
@harishukla93 Does the file /home/ubuntu/Downloads/NewDLLs/Classes.dll
exist ?
Yes, infact this file is available on all three paths: /home/ubuntu/Downloads/NewDLLs /home/ubuntu/project/rs-etl-test/bin/Debug/netcoreapp3.1 /opt/Microsoft.Spark.Worker-1.0.0/
@harishukla93 was Classes.dll recompiiled and copied to the /home/ubuntu/Downloads/NewDLLs/
after recompiling your main app from .net 5 to .net core 3.1 ?
This is something with this Classes.dll I have. I got to know from source of this DLL that this is Classes.dll is build for .net4.0 and x86.
Today morning I have got the new DLL with .NET Core 3.1, but still no luck.
@harishukla93 was Classes.dll recompiiled and copied to the
/home/ubuntu/Downloads/NewDLLs/
after recompiling your main app from .net 5 to .net core 3.1 ?
I had created a new app with 3.1 and with the hintpath mentioned in my .csproj, it copied the file to bin/Debug/netcoreapp3.1
So I don't think we even need to have a spparate copy at /home/ubuntu/Downloads/NewDLLs/ , I mean I am running spark-submit from bin/Debug/netcoreapp3.1 where I have all DLLs and this path is where worker is also looking for.DLL.
@suhsteve I have used sources directly to get rid off the Classes.dll. But I am deserializing some data in UDF using System.Runtime.Serialization.Formatters
BinaryFormatter and MemoryStream. But it is giving me below error:
[Warn] [AssemblyLoader] Assembly 'System.Runtime.Serialization.Formatters.resources, Version=4.0.4.0, Culture=en-IN, PublicKeyToken=b03f5f7f11d50a3a' file not found 'System.Runtime.Serialization.Formatters.resources[.dll,.ni.dll]' in '/tmp/spark-024dfc93-f0fc-4c04-8737-ba0dbc8370bf/userFiles-599198e1-61d3-43f7-b810-c6d5376c2d65,/home/incs83/project/rs-etl-test/bin/Debug/netcoreapp3.1,/opt/Microsoft.Spark.Worker-1.0.0/' [2021-04-20T06:51:51.5112399Z] [incs83-Vostro-3490] [Warn] [AssemblyLoader] Assembly 'System.Runtime.Serialization.Formatters.resources, Version=4.0.4.0, Culture=en, PublicKeyToken=b03f5f7f11d50a3a' file not found 'System.Runtime.Serialization.Formatters.resources[.dll,.ni.dll]' in '/tmp/spark-024dfc93-f0fc-4c04-8737-ba0dbc8370bf/userFiles-599198e1-61d3-43f7-b810-c6d5376c2d65,/home/incs83/project/rs-etl-test/bin/Debug/netcoreapp3.1,/opt/Microsoft.Spark.Worker-1.0.0/'
@clegendre Got to know this is obsolete in .NET5, so I am using .NETCore 3.1 but facing this issue.
Please help!!
I am new to DOTNET with spark and facing some issues with passing DLLs. Basically, I have some DLL files (from another c# project) which I want to reuse here in my Spark project UDF.
Error: [Warn] [AssemblyLoader] Assembly 'Classes, Version=3.0.142.0, Culture=neutral, PublicKeyToken=910ab64095116ac0' file not found 'Classes[.dll,.ni.dll]' in '/tmp/spark-e2e6444a-99fc-42c6-ae15-8a5b328e3038/userFiles-aafb5491-4485-46d9-8e17-0849aed7c57a,/home/ubuntu/project/mySparkApp/bin/Debug/net5.0,/opt/Microsoft.Spark.Worker-1.0.0/' [2021-04-13T11:16:15.1691280Z] [ubuntu-Vostro] [Error] [TaskRunner] [1] ProcessStream() failed with exception: System.IO.FileNotFoundException: Could not load file or assembly 'Classes, Version=3.0.142.0, Culture=neutral, PublicKeyToken=910ab64095116ac0'. The system cannot find the file specified.
Here I have copied Classes.dll (an external DLL) file in my
home/ubuntu/project/mySparkApp
. Initially, I was facing the same error with mySparkApp.dll and I resolved that with copying in my current directory and that woked. But in case of this third party DLL, it failed to find.Here is my .csproj file where I have mentioned the Classes.dll: `
` Here is spark-submit:
spark-submit --class org.apache.spark.deploy.dotnet.DotnetRunner --master local bin/Debug/net5.0/microsoft-spark-3-0_2.12-1.0.0.jar dotnet bin/Debug/net5.0/mySparkApp.dll
I have spend a lot of time digging into this, still no luck.