Open OopsOutOfMemory opened 9 years ago
What is the size of your dataset and can you show us your script ?
On Fri, Jun 26, 2015 at 4:59 AM, Sheng, Li notifications@github.com wrote:
Hi, @suvodeep-pyne https://github.com/suvodeep-pyne @mparkhe https://github.com/mparkhe When I perform a BLOCKGEN operation, at final reduce, the Java Heap Size exception throws. I increased the REDUCE NUMBER seems not work.
2015-06-26 16:14:56,215 INFO [main] org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: finalMerge called with 4 in-memory map-outputs and 5 on-disk map-outputs 2015-06-26 16:14:56,217 INFO [main] org.apache.hadoop.mapred.Merger: Merging 4 sorted segments 2015-06-26 16:14:56,217 INFO [main] org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 4 segments left of total size: 142558662 bytes 2015-06-26 16:14:57,234 INFO [main] org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: Merged 4 segments, 142558706 bytes to disk to satisfy reduce memory limit 2015-06-26 16:14:57,235 INFO [main] org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: Merging 6 files, 789110010 bytes from disk 2015-06-26 16:14:57,236 INFO [main] org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce 2015-06-26 16:14:57,236 INFO [main] org.apache.hadoop.mapred.Merger: Merging 6 sorted segments 2015-06-26 16:14:57,243 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.snappy] 2015-06-26 16:14:57,243 INFO [main] org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 6 segments left of total size: 3293450894 bytes 2015-06-26 16:14:57,605 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords 2015-06-26 16:15:44,841 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.OutOfMemoryError: Java heap space at com.linkedin.cubert.memory.PagedByteArray.ensureCapacity(PagedByteArray.java:192) at com.linkedin.cubert.memory.PagedByteArray.write(PagedByteArray.java:141) at com.linkedin.cubert.memory.PagedByteArrayOutputStream.write(PagedByteArrayOutputStream.java:67) at java.io.DataOutputStream.write(DataOutputStream.java:107) at java.io.DataOutputStream.writeUTF(DataOutputStream.java:401) at java.io.DataOutputStream.writeUTF(DataOutputStream.java:323) at org.apache.pig.data.utils.SedesHelper.writeChararray(SedesHelper.java:66) at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:580) at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:462) at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135) at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:650) at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:470) at org.apache.pig.data.BinSedesTuple.write(BinSedesTuple.java:40) at com.linkedin.cubert.io.DefaultTupleSerializer.serialize(DefaultTupleSerializer.java:41) at com.linkedin.cubert.io.DefaultTupleSerializer.serialize(DefaultTupleSerializer.java:28) at com.linkedin.cubert.utils.SerializedTupleStore.addToStore(SerializedTupleStore.java:118) at com.linkedin.cubert.block.CreateBlockOperator$StoredBlock.
(CreateBlockOperator.java:145) at com.linkedin.cubert.block.CreateBlockOperator.createBlock(CreateBlockOperator.java:536) at com.linkedin.cubert.block.CreateBlockOperator.next(CreateBlockOperator.java:488) at com.linkedin.cubert.plan.physical.PhaseExecutor.prepareOperatorChain(PhaseExecutor.java:261) at com.linkedin.cubert.plan.physical.PhaseExecutor. (PhaseExecutor.java:111) at com.linkedin.cubert.plan.physical.CubertReducer.run(CubertReducer.java:68) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) — Reply to this email directly or view it on GitHub https://github.com/linkedin/Cubert/issues/8.
Hi, @suvodeep-pyne, sorry for replying late.
Here is my script:
JOB "Cube GENBLOCK SESSION_ID"
REDUCERS 500;
MAP {
data = LOAD "/hive_warehouse/temp_bigdata.db/temp_visitorcube_pv" USING TEXT("schema":
"STRING log_date, STRING app_name, STRING app_platform, STRING buyer_type,INT is_app_new_user,INT is_new,STRING vmark_name,
STRING user_class,STRING cookie_class,STRING area_prov,STRING area_city,STRING city_level,STRING vip_province,STRING warehouse,
STRING app_version, STRING channel_id,STRING lvl1_channel,STRING lvl2_channel,STRING delivery_channel,
STRING package_name,STRING os,STRING manufacturer,STRING phone_model,STRING network_provider,
LONG pv,STRING mid, STRING session_id, LONG stay_time, LONG bounce");
}
BLOCKGEN data BY SIZE 64000000 PARTITIONED ON session_id;
STORE data INTO "/user/victorsheng/cubert/temp_session_id" USING RUBIX("overwrite":"true");
END
The size of the table is almost 10.8GB. And the schema of this table :
CREATE TABLE `temp_bigdata.temp_visitorcube_pv`(
`log_date` string,
`app_name` string,
`app_platform` string,
`buyer_type` string,
`is_app_new_user` int,
`is_new` int,
`vmark_name` string,
`user_class` string,
`cookie_class` string,
`area_prov` string,
`area_city` string,
`city_level` string,
`vip_province` string,
`warehouse` string,
`app_version` string,
`channel_id` string,
`lvl1_channel` string,
`lvl2_channel` string,
`delivery_channel` string,
`package_name` string,
`os` string,
`manufacturer` string,
`phone_model` string,
`network_provider` string,
`pv` bigint,
`mid` string,
`session_id` string,
`stay_time` bigint,
`bounce` bigint)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs:///hive_warehouse/temp_bigdata.db/temp_visitorcube_pv'
TBLPROPERTIES (
'numFiles'='21',
'transient_lastDdlTime'='1434533424',
'COLUMN_STATS_ACCURATE'='true',
'totalSize'='11620963474',
'numRows'='41252410',
'rawDataSize'='11579711064')
And the sample data of column session_id is like this :
60c9d53dc7c9bc839205edc50816c9e0573671f6_shop_iphone_5.4_20150615_0003
60c9d53dc7c9bc839205edc50816c9e0573671f6_shop_iphone_5.4_20150615_0002
60c9d53dc7c9bc839205edc50816c9e0573671f6_shop_iphone_5.4_20150615_0001
f6d0ee676dfd2e3d62fb78611184aa3bb951052f_shop_iphone_5.4_20150615_0001
-99
f6d0ee676dfd2e3d62fb78611184aa3bb951052f_shop_iphone_5.4_20150615_0001
-99
b6a387f2-9d11-3981-b135-b6250ebfb487_shop_android_5.3.2_20150615_0001
741bdf67-e155-3a47-a3d3-a80009d6f6bf_shop_android_5.3.2_20150615_0001
-99
741bdf67-e155-3a47-a3d3-a80009d6f6bf_shop_android_5.3.2_20150615_0001
-99
-99
-99
-99
-99
-99
-99
6c45afb7aa7df606678ca150f18c53ebf16b3070_shop_iphone_5.4_20150615_0001
6c45afb7aa7df606678ca150f18c53ebf16b3070_shop_iphone_5.4_20150615_0005
6c45afb7aa7df606678ca150f18c53ebf16b3070_shop_iphone_5.4_20150615_0004
6c45afb7aa7df606678ca150f18c53ebf16b3070_shop_iphone_5.4_20150615_0002
6c45afb7aa7df606678ca150f18c53ebf16b3070_shop_iphone_5.4_20150615_0003
6c45afb7aa7df606678ca150f18c53ebf16b3070_shop_iphone_5.4_20150615_0001
6c45afb7aa7df606678ca150f18c53ebf16b3070_shop_iphone_5.4_20150615_0002
6c45afb7aa7df606678ca150f18c53ebf16b3070_shop_iphone_5.4_20150615_0007
6c45afb7aa7df606678ca150f18c53ebf16b3070_shop_iphone_5.4_20150615_0006
-99
38233855e8a6fee9822f94222869453fd43a5585_shop_iphone_5.4_20150615_0001
38233855e8a6fee9822f94222869453fd43a5585_shop_iphone_5.4_20150615_0003
38233855e8a6fee9822f94222869453fd43a5585_shop_iphone_5.4_20150615_0001
38233855e8a6fee9822f94222869453fd43a5585_shop_iphone_5.4_20150615_0002
6a762e053344c261d1d89d1fa959d158c5515222_shop_iphone_5.4_20150615_0002
-99
6a762e053344c261d1d89d1fa959d158c5515222_shop_iphone_5.4_20150615_0002
6a762e053344c261d1d89d1fa959d158c5515222_shop_iphone_5.4_20150615_0001
17361e9f-1025-34ed-b8e4-20a04084f812_shop_android_5.4.3_20150615_0002
-99
17361e9f-1025-34ed-b8e4-20a04084f812_shop_android_5.4.3_20150615_0002
17361e9f-1025-34ed-b8e4-20a04084f812_shop_android_5.4.3_20150615_0001
17361e9f-1025-34ed-b8e4-20a04084f812_shop_android_5.4.3_20150615_0001
269e4e88-b4b7-394f-ae03-ab356a84e572_shop_android_5.4.3_20150615_0001
-99
269e4e88-b4b7-394f-ae03-ab356a84e572_shop_android_5.4.3_20150615_0001
BTW, this exception throws in Reduce Side.
And failed in the last reducer of the total reducers.
eg: total reducers 500, successful 499.
I've noticed that the information about last merge operation in reducer, merge-pass needs 3GB:
2015-06-26 16:14:57,243 INFO [main] org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 6 segments left of total size: 3293450894 bytes
I've doubt if my container size is too small, say 1500MB / per container.
PS: Does this is the issue that a single key has a large ValueList in reduce side?
hive> select session_id,count(1) from temp_visitorcube_pv group by session_id having count(1) > 196;
Total MapReduce CPU Time Spent: 13 minutes 41 seconds 400 msec
OK
e313d21f-4feb-33ed-82ae-e7aec7dbf19b_android_5.3.2_20150615_0105 310
814010d9-dc6f-33a7-bf23-6cd6b86ce635_android_5.3.2_20150615_0001 416
-99 11390639
Time taken: 130.83 seconds, Fetched: 3 row(s)
How to avoid this ? What does cubert do?
Hi, @suvodeep-pyne, I'm sure this is a data skew issue.
I need compute count_distinct(session_id)
But this always run out of memory since the there are 11390639 records has the same session_id -99
.
can you try using dictionary encoding your session_id? Once you convert them to integers. I think you should be able to handle 11m ints in 1 reducer.
On Tue, Jun 30, 2015 at 1:44 AM, Sheng, Li notifications@github.com wrote:
Hi, @suvodeep-pyne https://github.com/suvodeep-pyne, I'm sure this is a data skew issue.
I need compute count_distinct(session_id)
But this always run out of memory since the there are 11390639 records has the same session_id -99 .
— Reply to this email directly or view it on GitHub https://github.com/linkedin/Cubert/issues/8#issuecomment-117058446.
Hi, @suvodeep-pyne @mparkhe When I perform a BLOCKGEN operation, at final reduce, the Java Heap Size exception throws. I increased the REDUCE NUMBER seems not work.