dotnet / spark

.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
https://dot.net/spark
MIT License
2.02k stars 312 forks source link

Call .net for Apache Spark from WEB API #829

Closed sindujacse closed 3 years ago

sindujacse commented 3 years ago

Can we invoke .net for apache spark from .net core web api? my request is to have a simple web page which has the file upload button to upload the file and submit. By submitting, the application should invoke spark session and read the data from csv , load into dataframe and upload to mongo db.

suhsteve commented 3 years ago

Should be possible. It can be something as simple as running a spark-submit after the file has been submitted through the web page and verifying that the job finished successfully. However, if it's something more complex where you want your web app to use SparkSession, DataFrames etc, then you'll want to ensure that spark is running and DotnetBackend is available for Spark .NET to call into (similar to running DotnetRunner in debug mode)

You should try making a POC and see if it meets your requirements.

sindujacse commented 3 years ago

Hi @suhsteve - Thanks for the comment, may i know how to run this - spark and DotnetBackend for my web application to use spark session and Dataframes

suhsteve commented 3 years ago

You can try this out by running spark-submit in debug mode. This will start the DotnetBackend and wait for connections from Spark .NET.

> spark-submit --class org.apache.spark.deploy.dotnet.DotnetRunner microsoft-spark-2-4_2.11-1.0.0.jar debug
21/03/03 23:41:59 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
21/03/03 23:41:59 INFO DotnetRunner: Starting DotnetBackend with .
21/03/03 23:41:59 INFO DotnetBackend: The number of DotnetBackend threads is set to 10.
21/03/03 23:42:00 INFO DotnetRunner: Port number used by DotnetBackend is 5567
***********************************************************************
* .NET Backend running debug mode. Press enter to exit *
***********************************************************************
suhsteve commented 3 years ago

closing for now. Feel free to open if this did not solve your issue.

sindujacse commented 3 years ago

Can I write spark-submit with parameters to run the application inside a shell script and schedule it? as im not sure if my organization has an account for deploying it to azure synapse or databricks or hd insight. im looking to host in production linux server.