Hydrospheredata / mist

Serverless proxy for Spark cluster
http://hydrosphere.io/mist/
Apache License 2.0
326 stars 68 forks source link

Add support for python3 jobs #431

Closed austinnichols101 closed 6 years ago

austinnichols101 commented 6 years ago

Current 1.0 RC 11 only supports python 2 jobs.

blvp commented 6 years ago

I think it depends only on system python version. Can you check python version on spark slaves and mist itself? Also, can you provide code that is causing the error?

blvp commented 6 years ago

mist docker image is running with python 2.7 and this is not configurable. But we found out that our worker execution code is not compatible with py3.6 version. And also there is the issue in spark 2.1+. You can not run python 3.6 code withing spark 2.2 it will raise next error:

cls = _old_namedtuple(*args, **kwargs)
TypeError: namedtuple() missing 3 required keyword-only arguments: 'verbose', 'rename', and 'module'
austinnichols101 commented 6 years ago

SPARK-19019 shows as resolution fixed in Fix Version/s: 1.6.4, 2.0.3, 2.1.1, 2.2.0

austinnichols101 commented 6 years ago

I believe there are still issues with python3 support. After applying the fix above, I tried running a job on a system where the system python is python3 and received the error message below. My understanding is that iteritems was removed from python3.

"Error running job with JobParams(python_version.py,SimpleContext,Map(),execute). Type: java.lang.Exception, message: Error in python code: Traceback (most recent call last):\n  File \"/usr/share/mist/mist-worker.jar/__main__.py\", line 97, in <module>\n    result = instance.execute(**to_python_types(parameters))\n  File \"/usr/share/mist/mist-worker.jar/__main__.py\", line 28, in to_python_types\n    for key, value in any.iteritems():\n  File \"/usr/share/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py\", line 1133, in __call__\n    answer, self.gateway_client, self.target_id, self.name)\n  File \"/usr/share/spark/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py\", line 323, in get_return_value\n    format(target_id, \".\", name, value))\npy4j.protocol.Py4JError: An error occurred while calling o29.iteritems. Trace:\npy4j.Py4JException: Method iteritems([]) does not exist\n\tat py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)\n\tat py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)\n\tat py4j.Gateway.invoke(Gateway.java:272)\n\tat py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)\n\tat py4j.commands.CallCommand.execute(CallCommand.java:79)\n\tat py4j.GatewayConnection.run(GatewayConnection.java:214)\n\tat java.lang.Thread.run(Thread.java:748)\n\n\n, trace io.hydrosphere.mist.worker.runners.python.PythonRunner.run(PythonRunner.scala:57); io.hydrosphere.mist.worker.JobStarting$class.io$hydrosphere$mist$worker$JobStarting$$runJob(WorkerActor.scala:65); io.hydrosphere.mist.worker.JobStarting$$anonfun$1.apply(WorkerActor.scala:38); io.hydrosphere.mist.worker.JobStarting$$anonfun$1.apply(WorkerActor.scala:36); scala.util.Success$$anonfun$map$1.apply(Try.scala:237); scala.util.Try$.apply(Try.scala:192); scala.util.Success.map(Try.scala:237); scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:237); scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:237); scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32); java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149); java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624); java.lang.Thread.run(Thread.java:748)"
from mist.mist_job import MistJob
import sys
import socket

class SimpleContext(MistJob):

    def execute(self):

        # foo = 'bar'
        # print(f'foo = {foo}')

        ret = dict()
        ret['version'] = sys.version_info[0]
        ret['interpreter'] = sys.executable
        ret['hostname'] = socket.gethostname()

        return ret
blvp commented 6 years ago

fixed in https://github.com/Hydrospheredata/mist/blob/v1.0.0-RC12/mist/worker/src/main/resources/__main__.py#L28

austinnichols101 commented 6 years ago

Thanks @blvp - will check now. I made a mistake when refreshing my sources...

austinnichols101 commented 6 years ago

confirmed - I was able to submit a python3 job. Thank you sir!