Yelp / mrjob

Run MapReduce jobs on Hadoop or Amazon Web Services
http://packages.python.org/mrjob/
Other
2.62k stars 587 forks source link

Cluster id 'j-XXXXXX' is not valid? #1855

Closed aaqibjavith closed 6 years ago

aaqibjavith commented 6 years ago

I am running sample wordcount script in emr with cluster id j-XXXXXX as follows

 python word_count.py README.rst -r emr --cluster-id j-XXXXXX

It gets failed. Error message is as follows

Adding our job to existing cluster j-XXXXXX
Traceback (most recent call last):
  File "word_count.py", line 27, in <module>
    MRWordFreqCount.run()
  File "/usr/local/lib/python2.7/site-packages/mrjob/job.py", line 436, in run
    mr_job.execute()
  File "/usr/local/lib/python2.7/site-packages/mrjob/job.py", line 457, in execute
    super(MRJob, self).execute()
  File "/usr/local/lib/python2.7/site-packages/mrjob/launch.py", line 187, in execute
    self.run_job()
  File "/usr/local/lib/python2.7/site-packages/mrjob/launch.py", line 235, in run_job
    runner.run()
  File "/usr/local/lib/python2.7/site-packages/mrjob/runner.py", line 518, in run
    self._run()
  File "/usr/local/lib/python2.7/site-packages/mrjob/emr.py", line 689, in _run
    self._launch()
  File "/usr/local/lib/python2.7/site-packages/mrjob/emr.py", line 712, in _launch
    self._launch_emr_job()
  File "/usr/local/lib/python2.7/site-packages/mrjob/emr.py", line 1473, in _launch_emr_job
    steps = self._build_steps()
  File "/usr/local/lib/python2.7/site-packages/mrjob/emr.py", line 1299, in _build_steps
    steps.append(self._build_step(n))
  File "/usr/local/lib/python2.7/site-packages/mrjob/emr.py", line 1315, in _build_step
    hadoop_jar_step = method(step_num)
  File "/usr/local/lib/python2.7/site-packages/mrjob/emr.py", line 1324, in _streaming_step_hadoop_jar_step
    jar, step_arg_prefix = self._get_streaming_jar_and_step_arg_prefix()
  File "/usr/local/lib/python2.7/site-packages/mrjob/emr.py", line 1433, in _get_streaming_jar_and_step_arg_prefix
    elif version_gte(self.get_image_version(), '4'):
  File "/usr/local/lib/python2.7/site-packages/mrjob/emr.py", line 2672, in get_image_version
    return self._get_cluster_info('image_version')
  File "/usr/local/lib/python2.7/site-packages/mrjob/emr.py", line 2704, in _get_cluster_info
    self._store_cluster_info()
  File "/usr/local/lib/python2.7/site-packages/mrjob/emr.py", line 2716, in _store_cluster_info
    cluster = self._describe_cluster()
  File "/usr/local/lib/python2.7/site-packages/mrjob/emr.py", line 2660, in _describe_cluster
    ClusterId=self._cluster_id)['Cluster']
  File "/usr/local/lib/python2.7/site-packages/mrjob/retry.py", line 90, in call_and_maybe_retry
    return f(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/botocore/client.py", line 317, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/lib/python2.7/dist-packages/botocore/client.py", line 615, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.errorfactory.InvalidRequestException: An error occurred (InvalidRequestException) 
  when calling the DescribeCluster operation: Cluster id 'j-XXXXXX' is not valid

I cross checked the cluster id in EMR dashboard. It matches. Why I am getting this error? How do I fix this issue?

aaqibjavith commented 6 years ago

I could solve this issue by adding region and subnet. This is my mrjob.conf file

runners:
  emr:
    aws_access_key_id:
    aws_secret_access_key:

    region: us-east-2
    subnet: subnet-6019a21c

    ec2_key_pair: emr
    ec2_key_pair_file: /home/ec2-user/.ssh/emr.pem
    ssh_tunnel: true

    instance_type: m4.large
    master_instance_type: m4.large
    num_core_instances: 2

    # There's a newer AMI version but it has issues with the released stable mrjob
    #ami_version: 3.0.4
    interpreter: python2.7
    bootstrap:
    - sudo yum install -y python27 python27-devel gcc-c++
    - sudo wget -S -T 10 -t 5 https://bootstrap.pypa.io/get-pip.py
    - sudo python2.7 get-pip.py
    - sudo /usr/local/bin/pip2.7 install mrjob simplejson warc phonenumbers boto --ignore-installed chardet
    - sudo /usr/local/bin/pip2.7 install https://github.com/commoncrawl/gzipstream/archive/master.zip

Hope it helps