ddf-project / DDF

Distributed DataFrame: Productivity = Power x Simplicity For Scientists & Engineers, on any Data Engine
http://ddf.io
Apache License 2.0
167 stars 42 forks source link

[PE-2161] Improve Cache Behavior #363

Closed dnsang closed 8 years ago

dnsang commented 8 years ago

Description and related tickets, documents

Reviewers:

Note: This branch will continue working for any change of cache behavior in the future. So don't delete it after merge.

PR Progress

Make sure all checkboxes below are checked before merged

lebinh commented 8 years ago

@zkidkid I agree that the call to cache and materialize should be removed from copyFrom method but it shouldn't be put inside applySchema either and should be called from PE side. Also please find where are copyFrom being called in PE and add caching there at least until we finalise design + started working on the new caching behavior.

lebinh commented 8 years ago

@zkidkid @hai-adatao tests passed but took a lot longer than before (2x slower), likely because of uncached DDF, e.g. assertion on number of rows will have to load the DDF again.

hai-adatao commented 8 years ago

@lebinh I understand the fact that it will be slower and OK with that, this should go together with new caching proposal and the tests also need to change (or some configuration need to be changed for the tests to behave like before). The problem is it causes GC in PE, which is really troublesome if we don't understand what's going on.

ubolonton commented 8 years ago

@hai-adatao While waiting for the new caching behavior (and even after that is done), the tests need to be updated to call cache themselves

hai-adatao commented 8 years ago

Per discussion this morning, this PR will now be closed in favor of https://github.com/ddf-project/DDF/pull/364