Yelp / mrjob

Run MapReduce jobs on Hadoop or Amazon Web Services
http://packages.python.org/mrjob/
Other
2.61k stars 588 forks source link

ops, ssh subprocess exited with return code 255, restarting... #1862

Open aaqibjavith opened 5 years ago

aaqibjavith commented 5 years ago

I am trying to find keywords from CommonCrawl archive. When I tried to run with one wet.gz file, it works fine. But If I try to run our script with entire wet archive files, then we got following error.

Using s3://mrjob-1c535120e37b953d/tmp/ as our temp dir on S3
Copying local files to s3://mrjob-1c535120e37b953d/tmp/email_address.ec2-user.20181007.080547.129600/files/...
Adding our job to existing cluster j-119SGKPP3LAQ2
Creating temp directory /tmp/email_address.ec2-user.20181007.080547.129600
  Connect to resource manager at: http://localhost:40548/cluster
Waiting for Step 1 of 1 (s-KV7OE1GS86IV) to complete...
  RUNNING for 0:00:05
  Oops, ssh subprocess exited with return code 255, restarting...
  Connect to resource manager at: http://localhost:40548/cluster
  RUNNING for 0:02:44
  Oops, ssh subprocess exited with return code 255, restarting...
  Connect to resource manager at: http://localhost:40548/cluster
  RUNNING for 0:05:26
  Oops, ssh subprocess exited with return code 255, restarting...
  Connect to resource manager at: http://localhost:40548/cluster
  RUNNING for 0:08:08
  Oops, ssh subprocess exited with return code 255, restarting...
  Connect to resource manager at: http://localhost:40548/cluster
  RUNNING for 0:10:50
  Oops, ssh subprocess exited with return code 255, restarting...
  Connect to resource manager at: http://localhost:40548/cluster
  RUNNING for 0:13:31
  Oops, ssh subprocess exited with return code 255, restarting...
  Connect to resource manager at: http://localhost:40548/cluster
  RUNNING for 0:16:13
  Oops, ssh subprocess exited with return code 255, restarting...
  Connect to resource manager at: http://localhost:40548/cluster
  RUNNING for 0:18:55
  Oops, ssh subprocess exited with return code 255, restarting...
  Connect to resource manager at: http://localhost:40548/cluster
  RUNNING for 0:21:37
  FAILED
 Cluster j-119SGKPP3LAQ2 is WAITING: Cluster ready after last step failed.

Why is this happening? How do I resolve this issue?

I am running mrjob script from one of our EC2 instances, not from my mac.

fizerkhan commented 5 years ago

+1 Due to this, I could not use the mrjob. Could anyone help me to fix this issue?