Falldog / pyconcrete

Protect your python script, encrypt it as .pye and decrypt when import it
Apache License 2.0
692 stars 149 forks source link

pyconcrete for submitting spark job #69

Open albertusk95 opened 4 years ago

albertusk95 commented 4 years ago

Hi,

I recently used pyconcrete to obfuscate pyspark codes. To run a spark job on a cluster, we need to use spark-submit command. So it would look like spark-submit job.py.

The concern here is that spark-submit seems to only accept .py extension in order for it to work. Since pyconcrete generates .pye files, I didn't find any way to run the encrypted files via spark-submit.

Is there a way to run encrypted files generated by pyconcrete with spark-submit?

Thank you.

Falldog commented 4 years ago

pyconcrete need binary .so, does spark-submit package your source code and upload to cloud for running? if yes, you need cross-compile pyconcrete.so first. And then you could run pyconcrete as library, try to build your code as .egg, spark seems allow you submit .egg, maybe it should work. Give it a shot.

albertusk95 commented 4 years ago

Already tried build code as .egg along with the driver program. But spark couldn't find the main class.

It seems that .egg files are only used as dependencies. spark-submit still needs the driver code in .py. So it would look like this: spark-submit --py-files path/to/file.egg driver.py.

According to the doc itself,

For Python applications, simply pass a .py file in the place of <application-jar> instead of a JAR, 
and add Python .zip, .egg or .py files to the search path with --py-files.
Falldog commented 2 years ago

Can you provide more information? Maybe it's spark-sumit issue, not pyconcrete.