The map, shuffle, and shuffle-sort phases finished (costing over $500). Then the pipeline aborted for no obvious reason. We're hoping to find a way to resume from where we stopped because we don't want to start over and incur the same cost again. This is the error that broke the pipeline. Seems to be a bug in starting the merge phase.
E 2016-01-26 12:33:11.656 200 84 B 49.05 s D 12:33:17.613 E 12:33:39.697 W 12:34:00.009 /mapreduce/pipeline/run
0.1.0.2 - - [26/Jan/2016:12:33:11 -0800] "POST /mapreduce/pipeline/run HTTP/1.1" 200 84 http://live.networkedblogshr.appspot.com/mapreduce/pipeline/run "AppEngine-Google; (+http://code.google.com/appengine)" "live.networkedblogshr.appspot.com" ms=49052 cpu_ms=7914 cpm_usd=9.387e-06 instance=00c61b117ccb1bc29dba9a1b1318d55b1028576e app_engine_release=1.9.31 trace_id=-
D 12:33:17.613 Running mapreduce.mapper_pipeline.MapperPipeline(*(u'populateshortlinkblocks-shuffle-merge', u'mapreduce.shuffler._merge_map', u'mapreduce.shuffler._MergingReader'), **{'output_writer_spec': u'mapreduce.output_writers._GoogleCloudStorageRecordOutputWriter', 'params': {u'files': [[u'/networkedblogshr.appspot.com/populateshortlinkblocks-shuffle-sort-0/157260387977788B... (2665890 bytes))#582f986f0a1240328bb363a4cec1b3eb
E 12:33:39.697 Generator mapreduce.mapper_pipeline.MapperPipeline(*(u'populateshortlinkblocks-shuffle-merge', u'mapreduce.shuffler._merge_map', u'mapreduce.shuffler._MergingReader'), **{'output_writer_spec': u'mapreduce.output_writers._GoogleCloudStorageRecordOutputWriter', 'params': {u'files': [[u'/networkedblogshr.appspot.com/populateshortlinkblocks-shuffle-sort-0/157260387977788B... (2665890 bytes))#582f986f0a1240328bb363a4cec1b3eb raised exception. RequestTooLargeError: The request to API call datastore_v3.Put() was too large.
Traceback (most recent call last):
File "/base/data/home/apps/s~networkedblogshr/live.390252708758560814/pipeline/pipeline.py", line 2144, in evaluate
self, pipeline_key, root_pipeline_key, caller_output)
File "/base/data/home/apps/s~networkedblogshr/live.390252708758560814/pipeline/pipeline.py", line 1110, in _run_internal
return self.run(*self.args, **self.kwargs)
File "/base/data/home/apps/s~networkedblogshr/live.390252708758560814/mapreduce/mapper_pipeline.py", line 98, in run
queue_name=self.queue_name,
File "/base/data/home/apps/s~networkedblogshr/live.390252708758560814/mapreduce/control.py", line 125, in start_map
in_xg_transaction=in_xg_transaction)
File "/base/data/home/apps/s~networkedblogshr/live.390252708758560814/mapreduce/handlers.py", line 1761, in _start_map
_txn()
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/datastore.py", line 2732, in inner_wrapper
return RunInTransactionOptions(options, func, *args, **kwds)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/datastore.py", line 2630, in RunInTransactionOptions
ok, result = _DoOneTry(function, args, kwargs)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/datastore.py", line 2650, in _DoOneTry
result = function(*args, **kwargs)
File "/base/data/home/apps/s~networkedblogshr/live.390252708758560814/mapreduce/handlers.py", line 1758, in _txn
cls._create_and_save_state(mapreduce_spec, _app)
File "/base/data/home/apps/s~networkedblogshr/live.390252708758560814/mapreduce/handlers.py", line 1785, in _create_and_save_state
state.put(config=config)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/db/__init__.py", line 1077, in put
return datastore.Put(self._entity, **kwargs)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/datastore.py", line 605, in Put
return PutAsync(entities, **kwargs).get_result()
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 613, in get_result
return self.__get_result_hook(self)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 1881, in __put_hook
self.check_rpc_success(rpc)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 1371, in check_rpc_success
rpc.check_success()
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 579, in check_success
self.__rpc.CheckSuccess()
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/apiproxy_rpc.py", line 134, in CheckSuccess
raise self.exception
RequestTooLargeError: The request to API call datastore_v3.Put() was too large.
W 12:34:00.009 Giving up on pipeline ID "582f986f0a1240328bb363a4cec1b3eb" after 3 attempt(s); causing abort all the way to the root pipeline ID "334b2db5ec964b8c98b33bd29a210660"
Any suggestions on how to hack it to resume from where it stopped? By the way, the /abort handler also failed with another error but I'm guessing that's a side effect of this error.
We started a mapreduce job to group© some values from one table to another. The map and reduces phases are very simple, but the source table is with 500 million rows. The pipeline is a simple map/reduce
The map, shuffle, and shuffle-sort phases finished (costing over $500). Then the pipeline aborted for no obvious reason. We're hoping to find a way to resume from where we stopped because we don't want to start over and incur the same cost again. This is the error that broke the pipeline. Seems to be a bug in starting the merge phase.
Any suggestions on how to hack it to resume from where it stopped? By the way, the /abort handler also failed with another error but I'm guessing that's a side effect of this error.