Closed swetepete closed 1 year ago
Hi @swetepete,
PySpark runs also on ARM - we use it in production on a Hadoop 3.2 / 3.3 based on Apache Bigtop, sometimes even on a mixed cluster (ARM and AMD64 machines).
The issue with ujson seems to be known, see on Stackoverflow or ultrajson#456.
Since ujson is an API-compatible but more performant replacement for the json module, you might work around the issue by
try:
import ujson as json
except ImportError:
import json
Thank you. That, as well as pip installing psutil, has allowed the command to execute successfully.
The linked bug is tagged as "completed" - should I open a new bug with ujson's developers, seeing as there may be some new compatibility issue they don't know about, or is this something PySpark might be able to address?
Thank you
open a new bug with ujson's developers
Nothing I can answer. After a closer look: the issue was fixed for ujson 5.0 and upwards: first, make sure that the latest ujson version is installed and the issue is reproducible.
is this something PySpark might be able to address?
If you mean "cc-pyspark": yes, we could the work-around using the json module as fall-back. But that's not a nice fix: makes the code less readable and less performant.
Closing - a work-around exists and the underlying issue in ujson is resolved.
I am using a 2021 iMac with the Apple M1 chip and macOS Monterey 12.4.
So far to set up PySpark I have
pip3 installed pyspark
, plus cloned this repo and installed from therequirements.txt
file, plus downloaded Java from their homepage. I'm using Python 3.8.9.I added the path to the
pip3
installation ofpyspark
toSPARK_HOME
in my.zshrc
andsourced
it:I then executed the following command:
I had to execute this from inside the
cc-pyspark
repo, otherwise the script could not find the programserver_count.py
.It returns this error message:
There's something wrong with my installation of "ujson", it is for arm, but PySpark is designed for x86? Is that correct?
What is the simplest way to fix this issue? Should I try to run PySpark in some kind of x86 emulation like Rosetta? Has PySpark not been designed for the M1 Chip?
Is there a chance this is the fault of my Java installation? I took the first one offered; it seemed to say x86, but when I tested running PySpark on its own, it seemed to work fine.
Thanks very much