Open AnandInguva opened 1 year ago
cc: @tvalentyn
AsIter view_fn
Iterable might look one element at a time and this could be more for the side input cache on the GCS bucket?
AsList view fn List materializes so we wouldn’t need too many reads from the side input cache at GCS bucket?
For AsIter with state_cache_size=100 mb,
State cache was enabled in https://github.com/apache/beam/issues/28770 .
What needs to happen?
Initially for python sdk, we will enable the
statecache
size from 0 MB to 100 MB. Then there are some improvements that could be made on the statecache. For example,The Java implementation for the cache is in: https://github.com/apache/beam/blob/master/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/Caches.java And most of the caching complexity is within: https://github.com/apache/beam/blob/master/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state/StateFetchingIterators.java With the views over these caches doing specific view level operations (e.g. merging old view of data with in-memory updates). Generally understanding the code in https://github.com/apache/beam/tree/master/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/state should provide most answers.
Issue Priority
Priority: 3 (nice-to-have improvement)
Issue Components