Closed aliebrahiiimi closed 2 years ago
Hi @aliebrahiiimi, I cannot find any possible reason in the stack why the SparkContext was shut down.
after a few hours the process stopped
As a general recommendation, I'd split the input into multiple parts, so that every part finishes in a shorter time span (30-60 minutes). Re-running a failed but small job isn't a big issue.
hi @sebastian-nagel , thanks for your response.Your suggestion will be implemented. However, in your idea, Does it have something to do with Spark's memory usage? Does memory-driver or executor-memory need to be configured?
If you run Spark in local mode (without a cluster) defining the driver memory should be sufficient because the executors run in the same JVM instance. Running the job on a cluster requires to configure also the executor memory, see spark-submit.
Closing this issue. @aliebrahiiimi: If there are any more questions feel free to reopen again or ask for help on the Common Crawl user forum. Thanks!
The following command is used for downloading files from common crawls, but after a few hours the process stopped and I have received the following error. It may depend on the configuration and parameters of spark-submit, could you please assist me?
script:
logs: