[SUPPORT] auto_clean stopped running during ingest

ehurheap commented 1 year ago

Describe the problem you faced

We are running spark streaming ingestion with the following cleaner configs:

(hoodie.clean.automatic -> true)
(hoodie.clean.max.commits -> 30)
(hoodie.cleaner.hours.retained -> 24)
(hoodie.cleaner.parallelism -> 256)
(hoodie.cleaner.policy -> KEEP_LATEST_BY_HOURS)

Ingestion commits happen about every 20-30 minutes. However using the hudi-cli I can see that the cleans occur far less frequently, and at some point about 3 weeks ago cleans stopped happening altogether. When the ingestion was restarted, it stalled on Generating list of file slices to be cleaned:, and eventually the executors ran out of memory and the job failed.

To allow ingestion to proceed we redeployed with automatic cleaner disabled.

Questions:

Why did the cleaner stop running?
Is it expected that the cleans happen less frequently than commits?
Is cleaning impacted by not using the metadata table?
What is the best approach to catch up on all the files to be cleaned?

To Reproduce

Steps to reproduce the behavior:

Deploy ingestion with above write configs
observe cleans in hudi-cli cleans show
Redeploy ingestion after cleaner has stopped for some time

Expected behavior The cleaner table service is invoked immediately after each commit.

Environment Description

Hudi version : 0.12.1
Spark version : 3.3
Storage (HDFS/S3/GCS..) : S3
Running on Docker? (yes/no) : no

Additional context

additional write configs include

(hoodie.compact.inline -> false)
(hoodie.compact.schedule.inline -> false)
(hoodie.datasource.compaction.async.enable -> false)
(hoodie.datasource.write.operation -> insert)
(hoodie.datasource.write.partitionpath.field -> env_id,week)
(hoodie.datasource.write.precombine.field -> time)
(hoodie.datasource.write.recordkey.field -> env_id,user_id,event_id)
(hoodie.datasource.write.streaming.ignore.failed.batch -> false)
(hoodie.datasource.write.table.type -> MERGE_ON_READ)
(hoodie.insert.shuffle.parallelism -> 80)
(hoodie.metadata.enable -> false)
(hoodie.write.concurrency.mode,optimistic_concurrency_control)
(hoodie.write.lock.dynamodb.partition_key,key1)
(hoodie.write.lock.dynamodb.region,us-east-1)
(hoodie.write.lock.dynamodb.table,datalake-locks)
(hoodie.write.lock.provider,org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider)

hbgstc123 commented 1 year ago

Maybe your commits are archived before their committed data files are cleaned?
You can check the earliest commit in your hoodie active timeline.

ehurheap commented 1 year ago

Prior to the earliest deltacommit, we have 23 timestamps with cleans inflight: for example:

20221119132417991.clean
20221119132417991.clean.inflight
20221119132417991.clean.requested

Then the earliest deltacommit is:

20221208164706388.deltacommit
20221208164706388.deltacommit.inflight
20221208164706388.deltacommit.requested
20221208164706388.savepoint
20221208164706388.savepoint.inflight

Following that filegroup there are a few replacecommits:

20221208174357204.replacecommit
20221208174357204.replacecommit.inflight
20221208174357204.replacecommit.requested

etc. Followed by more deltacommits.

Are these in conflict somehow?
How can we tell what has been archived?
Is it possible that the archive/cleaner configs can be in conflict with each other?

(This is from looking at the files in the S3 console - trying to use the hudi-cli commands is very slow, as it can take an hour to list commits for this table.)

ehurheap commented 1 year ago

I have attempted to run the cleaner as a separate step from the ingestion. The ingestion is now configured with

hoodie.clean.automatic -> false
hoodie.archive.automatic -> false

Using the hudi-cli I submitted this clean command:

cleans run --sparkMaster local[8] --sparkMemory 60G --hoodieConfigs "hoodie.cleaner.policy=KEEP_LATEST_BY_HOURS hoodie.cleaner.hours.retained=1920 hoodie.cleaner.parallelism=400"

After an hour and a half, it failed with

...INFO S3NativeFileSystem: Opening 's3://path-to-table/users_changes-v1/.hoodie/20221208164706388.savepoint' for reading
#udi:users_changes->
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="kill -9 %p"
#   Executing /bin/sh -c "kill -9 1614"...
Failed to clean hoodie dataset

I tried tweaking the hudi-cli.sh script to specify -Xmx=90G in the java command, but that did not help.

As an alternative I attempted the spark-submit version of the cleans command like so:

spark-submit --deploy-mode cluster \
--conf spark.executor.instances=30 \
--conf spark.executor.cores=2 \
--conf spark.executor.memory=20G \
--conf spark.driver.memory=40G \
--conf spark.app.name=HoodieCleaner_users_changes-v1 \
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
--class org.apache.hudi.utilities.HoodieCleaner /usr/lib/hudi/hudi-utilities-bundle_2.12-0.11.1-amzn-0.jar \
--target-base-path s3://path-to-table/users_changes-v1 \
--hoodie-conf hoodie.metadata.enable=false \
--hoodie-conf hoodie.cleaner.policy=KEEP_LATEST_BY_HOURS \
--hoodie-conf hoodie.cleaner.hours.retained=2000 \
--hoodie-conf hoodie.cleaner.parallelism=400 \
--hoodie-conf hoodie.clean.allow.multiple=false \
--hoodie-conf hoodie.archive.async=false \
--hoodie-conf hoodie.archive.automatic=false

After about an hour the application attempt dies and the driver logs show this:

23/03/22 00:43:50 ERROR Javalin: Exception occurred while servicing http-request
java.lang.OutOfMemoryError: null
    at java.lang.AbstractStringBuilder.hugeCapacity(AbstractStringBuilder.java:161) ~[?:1.8.0_362]
    at java.lang.AbstractStringBuilder.newCapacity(AbstractStringBuilder.java:155) ~[?:1.8.0_362]
    at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:125) ~[?:1.8.0_362]
    at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:596) ~[?:1.8.0_362]
    at java.lang.StringBuilder.append(StringBuilder.java:195) ~[?:1.8.0_362]
    at com.fasterxml.jackson.core.util.TextBuffer.contentsAsString(TextBuffer.java:460) ~[jackson-core-2.13.3.jar:2.13.3]
    at com.fasterxml.jackson.core.io.SegmentedStringWriter.getAndClear(SegmentedStringWriter.java:85) ~[jackson-core-2.13.3.jar:2.13.3]
    at com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:3827) ~[jackson-databind-2.13.3.jar:2.13.3]
    at org.apache.hudi.timeline.service.RequestHandler.jsonifyResult(RequestHandler.java:198) ~[__app__.jar:0.11.1-amzn-0]
    at org.apache.hudi.timeline.service.RequestHandler.writeValueAsStringSync(RequestHandler.java:209) ~[__app__.jar:0.11.1-amzn-0]
    at org.apache.hudi.timeline.service.RequestHandler.writeValueAsString(RequestHandler.java:176) ~[__app__.jar:0.11.1-amzn-0]
    at org.apache.hudi.timeline.service.RequestHandler.lambda$registerFileSlicesAPI$18(RequestHandler.java:384) ~[__app__.jar:0.11.1-amzn-0]
    at org.apache.hudi.timeline.service.RequestHandler$ViewHandler.handle(RequestHandler.java:501) ~[__app__.jar:0.11.1-amzn-0]
    at io.javalin.core.security.SecurityUtil.noopAccessManager(SecurityUtil.kt:23) ~[__app__.jar:0.11.1-amzn-0]
    at io.javalin.http.JavalinServlet$addHandler$protectedHandler$1.handle(JavalinServlet.kt:128) ~[__app__.jar:0.11.1-amzn-0]
    at io.javalin.http.JavalinServlet$service$2$1.invoke(JavalinServlet.kt:45) ~[__app__.jar:0.11.1-amzn-0]
    at io.javalin.http.JavalinServlet$service$2$1.invoke(JavalinServlet.kt:24) ~[__app__.jar:0.11.1-amzn-0]
    at io.javalin.http.JavalinServlet$service$1.invoke(JavalinServlet.kt:136) ~[__app__.jar:0.11.1-amzn-0]
    at io.javalin.http.JavalinServlet$service$2.invoke(JavalinServlet.kt:40) ~[__app__.jar:0.11.1-amzn-0]
    at io.javalin.http.JavalinServlet.service(JavalinServlet.kt:81) ~[__app__.jar:0.11.1-amzn-0]
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) ~[javax.servlet-api-3.1.0.jar:3.1.0]
    at io.javalin.websocket.JavalinWsServlet.service(JavalinWsServlet.kt:51) ~[__app__.jar:0.11.1-amzn-0]

I tried several attempts tweaking these parameters for each run:

--hoodie-conf hoodie.cleaner.hours.retained
--conf spark.executor.instances
--conf spark.executor.cores
--conf spark.executor.memory
--conf spark.driver.memory

Each time I got some error, for example both the following are errors that occurred in different runs:

# java.lang.OutOfMemoryError: Requested array size exceeds VM limit

and

23/03/21 20:39:21 ERROR RequestHandler: Got runtime exception servicing request partition=env_id%3D2907378054%2Fweek%3D20221121&basepath=s3%3A%2F%2Fpath-to-table%2Fusers_changes-v1&lastinstantts=20230321200037228&timelinehash=c015e055fa5d5d3f14376d8c4aee8b41e5be8cd928f0c72068646cb95f9365c5
java.lang.NegativeArraySizeException: null
    at java.lang.AbstractStringBuilder.<init>(AbstractStringBuilder.java:68) ~[?:1.8.0_362]
    at java.lang.StringBuilder.<init>(StringBuilder.java:106) ~[?:1.8.0_362]
    at com.fasterxml.jackson.core.util.TextBuffer.contentsAsString(TextBuffer.java:455) ~[jackson-core-2.13.3.jar:2.13.3]
    at com.fasterxml.jackson.core.io.SegmentedStringWriter.getAndClear(SegmentedStringWriter.java:85) ~[jackson-core-2.13.3.jar:2.13.3]
    at com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:3827) ~[jackson-databind-2.13.3.jar:2.13.3]
    at org.apache.hudi.timeline.service.RequestHandler.jsonifyResult(RequestHandler.java:198) ~[__app__.jar:0.11.1-amzn-0]
    at org.apache.hudi.timeline.service.RequestHandler.writeValueAsStringSync(RequestHandler.java:209) ~[__app__.jar:0.11.1-amzn-0]
    at org.apache.hudi.timeline.service.RequestHandler.writeValueAsString(RequestHandler.java:176) ~[__app__.jar:0.11.1-amzn-0]
    at org.apache.hudi.timeline.service.RequestHandler.lambda$registerFileSlicesAPI$18(RequestHandler.java:384) ~[__app__.jar:0.11.1-amzn-0]
    at org.apache.hudi.timeline.service.RequestHandler$ViewHandler.handle(RequestHandler.java:501) ~[__app__.jar:0.11.1-amzn-0]
    at io.javalin.core.security.SecurityUtil.noopAccessManager(SecurityUtil.kt:23) ~[__app__.jar:0.11.1-amzn-0]

Is there some way to reduce the size of the clean operation so that we don't run into these errors?

ehurheap commented 1 year ago

This ticket seems to describe similar symptoms

nsivabalan commented 1 year ago

likely you have too huge no of partitions and hence jsonifying the results might result in OOM

can you try disabling embedded timeline server for couple of commits and re-enable cleaner. hoodie.embed.timeline.server=false.

hopefully this should make cleaner get past the issue.

ehurheap commented 1 year ago

Hi @nsivabalan - confirming that the cleaner worked with this config setting - THANKS!

Can you comment on why the timeline server does not have this problem during ingestion, only during cleaning?

nsivabalan commented 1 year ago

cool, we identified some perf hit w/ timeline server when dealing w/ too many files. https://github.com/apache/hudi/pull/8480 we fixed it in latest master and could go into 0.13.1.

So, I suggested to disable timeline server to get past few cleaner where replaced file groups are cleaned up. once the clean up is done, may be number of file groups to deal with are less and hence the timeline server should be able to cope up w/ it (w/o the above fix)

apache / hudi

[SUPPORT] auto_clean stopped running during ingest #8209