Open marfago opened 7 years ago
Hi @marfago, interesting, how can it be executed sequentially across executors? o: Can you attach your spark-submit
command and screenshot of spark web ui with jobs, executors and tasks? Btw, we can discuss it in our gitter channel :)
hi @pomadchin , find attached the PNGs.
Let me elaborate a little bit. I have the phisical servers, SN and FN: SN hosts spark master, spark worker, accumulo and hdfs name and data while FN hosts just a spark worker. Both node are in a docker network. I have also slightly changed the demo in order to ingest 10 raster images and the mask.
I would expect the ETL to fully allocate all CPUs, but both servers are quite idle and, in spark UI, I can just see one task at time running on one of the servers.
Any suggestions?
Hi,
I'm not sure this is a problem with geotrellis or spark or my configuration. Starting from geodocker-cluster and with some upgrade of the actual chattademo project (basically a porting to scalal 2.11) I was able to run the chatta demo. What I noticed is that, even using multiple spark workers, all the tasks are always executed sequentially and not in parallel. Is there any parameters to tune up to increase parallelism?