Memory Management - Githubissues

ryanmrichard commented 5 years ago

Raised by @robertjharrison on NWChemEx-Project/LibMathematician#1 (and also at PNNL meeting). Specifically:

[ ] how does the SDE automate management of memoized objects?
[ ] how does a user mark a memoized object for deletion?
[ ] how to iterate through cached objects?
[ ] object properties (size, lifetime, number of users, etc.)
[ ] save only some returns from a module
[ ] above with distributed objects

ryanmrichard commented 5 years ago

The cache is ultimately a map between hashes and module results. The results are stored in a holder class, ModuleResult. It is straightforward to add to that class a field for MemoryStrategy (an enum). The various strategies being:

NeverSave - The result is not cached ever.
NeverDelete - The result is guaranteed to stay in the cache
Dump2Disk - The result can be moved to disk if need be
Default - The result is none of the above For a given module, the ModuleManager is capable of determining when submodule results are no longer used again. For example say A calls B calls C and C returns data x. By looking for another call to C, the ModuleManager can determine whether it's safe to delete x after B returns.

robertjharrison commented 5 years ago

Also add

how to iterate thru cached objects
how to inquire about size, lifetime, #users, etc. of a cached object

On Mon, Jun 3, 2019 at 4:38 PM Ryan Richard notifications@github.com wrote:

Raised by @robertjharrison https://github.com/robertjharrison on NWChemEx-Project/LibMathematician#1 https://github.com/NWChemEx-Project/LibMathematician/issues/1 (and also at PNNL meeting). Specifically:

how does the SDE automate management of memoized objects?

how does a user mark a memoized object for deletion?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NWChemEx-Project/SDE/issues/82?email_source=notifications&email_token=ABZSAPMEKYTRJMHD4KNDSFDPYV6NDA5CNFSM4HSRWF2KYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GXMCHYQ, or mute the thread https://github.com/notifications/unsubscribe-auth/ABZSAPKQX7MTAYGQO7AL35DPYV6NDANCNFSM4HSRWF2A .

-- Robert J. Harrison tel: 865-274-8544

wadejong commented 5 years ago

Or, how can a developer control what gets memoized?

On Jun 3, 2019, at 1:38 PM, Ryan Richard notifications@github.com wrote:

Raised by @robertjharrison on NWChemEx-Project/LibMathematician#1 (and also at PNNL meeting). Specifically:

how does the SDE automate management of memoized objects? how does a user mark a memoized object for deletion? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

ryanmrichard commented 5 years ago

@wadejong could you expand on that a bit? I'm not sure how that's different than what @robertjharrison already asked.

wadejong commented 5 years ago

It seems the same and I saw the response such as NeverSave and NeverDelete. What about wanting to save parts of the results (energy but not density)?

On Mon, Jun 3, 2019 at 3:20 PM Ryan Richard notifications@github.com wrote:

@wadejong https://github.com/wadejong could you expand on that a bit? I'm not sure how that's different than what @robertjharrison https://github.com/robertjharrison already asked.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NWChemEx-Project/SDE/issues/82?email_source=notifications&email_token=ABTL24S2ZDSHD45KME7QGVDPYWKKVA5CNFSM4HSRWF2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODW23LDI#issuecomment-498447757, or mute the thread https://github.com/notifications/unsubscribe-auth/ABTL24XJKTYGSO426LZOQLTPYWKKVANCNFSM4HSRWF2A .

ryanmrichard commented 5 years ago

@wadejong I added that to the list. FWIW ModuleResult wraps each result individually (e.g., there is one ModuleResult instance for the energy and another instance for the density), so the purposed tag system should work with that.

@robertjharrison objects are stored in the Cache using shared_ptr instances so that takes care of lifetime/number of users (so long as distributed objects are designed to behave SPMD-like). Size is tricky. We have a few options:

most rigorous and least appealing is to wrap alloc.
less rigorous and more costly we can serialize the object and use the buffer size,
we can ask objects to tell us their size The last option is probably the most appealing, but can't be automated.

Naive iteration is easy, you get back iterators and you proceed like normal. The problem is what you get back are hash, type-erased value pairs, which is not particularly useful if you are looking for a result. The philosophy of Pulsar (which also influenced how I set up the SDE) was that there is no reason for a module developer to ever touch the cache. If you want to compute property X, call a module that computes X, don't manually dig through the cache for the result. The latter breaks encapsulation. That said, you are welcome to ask the module whether or not the call will be memoized as that can have consequences for scheduling, etc. There is however one time when iteration is needed and you do care about what each result actually is, that time is logging. One solution is to pick a cached result, say the converged J matrix, and then use the submodule call graph that generated the result as a key. For example, our DF J build would generate a key like density-fit J, inverse Coulomb metric, libint ERI, libint metric. Which is a concatentation of the descriptions of each submodule. Of course such a key is highly unlikely to line up with say MolSSI's schema...

robertjharrison commented 5 years ago

The reason we need to be able to inspect the cache is resource management.

We can expect to be routinely running at the limit of available memory --- a simple roofline model applied to the space-time tradeoff implies this.

Thus, we want to be using memory to

enable the largest possible calculations
to cache the things for which avoiding recomputation provides the greatest benefit

For reliable computation we need

estimates of how much memory is available, and
to be able to apply limits to how much space an algorithm (i.e., an SDE module) can use including for its cached results.

But we cannot expect default options for caching results to be universally good (e.g., an SCF caching the fock matrix, 3c integrals, etc. instead of just returning the energy), and relying on the user to override things correctly makes the code fragile and complex.

If we find we are short of memory, we can either crash or try to get rid of stuff in the cache. But what stuff? If we cannot find out "what" something is (and hence how much computation it might take to recompute), or which module is using it, or what size it is, we have no choice but to just delete everything.

Another topic to add to the design list is "how are cached results of submodules are merged with the parent module?"

On Tue, Jun 4, 2019 at 9:37 AM Ryan Richard notifications@github.com wrote:

@wadejong https://github.com/wadejong I added that to the list. FWIW ModuleResult wraps each result individually (e.g., there is one ModuleResult instance for the energy and another instance for the density), so the purposed tag system should work with that.

@robertjharrison https://github.com/robertjharrison objects are stored in the Cache using shared_ptr instances so that takes care of lifetime/number of users (so long as distributed objects are designed to behave SPMD-like). Size is tricky. We have a few options:

most rigorous and least appealing is to wrap alloc.

less rigorous and more costly we can serialize the object and use the buffer size,

we can ask objects to tell us their size The last option is probably the most appealing, but can't be automated.

Naive iteration is easy, you get back iterators and you proceed like normal. The problem is what you get back are hash, type-erased value pairs, which is not particularly useful if you are looking for a result. The philosophy of Pulsar (which also influenced how I set up the SDE) was that there is no reason for a module developer to ever touch the cache. If you want to compute property X, call a module that computes X, don't manually dig through the cache for the result. The latter breaks encapsulation. That said, you are welcome to ask the module whether or not the call will be memoized as that can have consequences for scheduling, etc. There is however one time when iteration is needed and you do care about what each result actually is, that time is logging. One solution is to pick a cached result, say the converged J matrix, and then use the submodule call graph that generated the result as a key. For example, our DF J build would generate a key like density-fit J, inverse Coulomb metric, libint ERI, libint metric. Which is a concatentation of the descriptions of each submodule. Of course such a key is highly unlikely to line up with say MolSSI's schema...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NWChemEx-Project/SDE/issues/82?email_source=notifications&email_token=ABZSAPK6QTQ4W7UL5QJZPF3PYZV2FA5CNFSM4HSRWF2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODW4S3SQ#issuecomment-498675146, or mute the thread https://github.com/notifications/unsubscribe-auth/ABZSAPKQAHUHQD3I6R2C64TPYZV2FANCNFSM4HSRWF2A .

-- Robert J. Harrison tel: 865-274-8544

ryanmrichard commented 5 years ago

@robertjharrison I'd argue the process for freeing up memory can not depend on what something is, only its properties. If we worry about what something is we end up introducing the coupling that the SDE is designed to avoid. That said, I purposed the importance flags above, which allow us to know what can and can't be deleted. Module developers ought to be able to provide decent defaults. Size is the other obvious property we'd want, but as I mentioned above it's not easy to get at.

I don't think it is feasible to limit an algorithm. We can provide it a runtime object that says the resources available to it, but there's not really any way to force it to adhere to those values.

I'm not sure what you mean by merging results. The keys are unique for a particular module and input. You can then more-or-less stick all of them in a giant map without fear of overwriting. In practice we actually have a cache per module so that the SDE knows which entries go with which module, but that's an implementation detail.

As a slight aside, one thing to keep in mind is the SDE is designed to make sure you can control every aspect of the calculation from the input. Users of this API are assumed to be experts and know what they are doing. While I'm doing my best to make this API as user-friendly as possible the reality is its necessarily going to be more complex and verbose than a typical NWX user would want. I don't see that as a problem, because It is always possible to hide complexity. @keipertk has already started designing convenience functions, built on top of the SDE, that will hide many of these details from a typical NWX user. Point being the issue of complexity, is in my opinion separate from exposure.

robertjharrison commented 5 years ago

All algorithms need to respect the limits of available memory ... it is essential. Simple algorithms with limited data might not care (e.g., a direct SCF with fully distributed matrices) but anything working at the limit of what is feasible or seeking the greatest possible speed needs to know how much memory is available in order to tile data and appropriately trade space vs. recomputation.

E.g., should you cache all of the 3c integrals, or just the most expensive ones, or none of them?

E.g., when transforming integrals what should be the batch size for the computation?

We cannot easily enforce memory limits. Instead, regard it as a contract a submodule makes with the invoker. The invoker says "you have 200GB available locally, and 2TB available globally" and the submodule tries to work within those limits --- if it exceeds those limits then presumably the submodule can choose several courses of action including crossing fingers.

On Tue, Jun 4, 2019 at 11:34 AM Ryan Richard notifications@github.com wrote:

@robertjharrison https://github.com/robertjharrison I'd argue the process for freeing up memory can not depend on what something is, only its properties. If we worry about what something is we end up introducing the coupling that the SDE is designed to avoid. That said, I purposed the importance flags above, which allow us to know what can and can't be deleted. Module developers ought to be able to provide decent defaults. Size is the other obvious property we'd want, but as I mentioned above it's not easy to get at.

I don't think it is feasible to limit an algorithm. We can provide it a runtime object that says the resources available to it, but there's not really any way to force it to adhere to those values.

I'm not sure what you mean by merging results. The keys are unique for a particular module and input. You can then more-or-less stick all of them in a giant map without fear of overwriting. In practice we actually have a cache per module so that the SDE knows which entries go with which module, but that's an implementation detail.

As a slight aside, one thing to keep in mind is the SDE is designed to make sure you can control every aspect of the calculation from the input. Users of this API are assumed to be experts and know what they are doing. While I'm doing my best to make this API as user-friendly as possible the reality is its necessarily going to be more complex and verbose than a typical NWX user would want. I don't see that as a problem, because It is always possible to hide complexity. @keipertk https://github.com/keipertk has already started designing convenience functions, built on top of the SDE, that will hide many of these details from a typical NWX user. Point being the issue of complexity, is in my opinion separate from exposure.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NWChemEx-Project/SDE/issues/82?email_source=notifications&email_token=ABZSAPMFUVPGB362UAJ5ZC3PY2DRNA5CNFSM4HSRWF2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODW462ZA#issuecomment-498724196, or mute the thread https://github.com/notifications/unsubscribe-auth/ABZSAPKLXHJQLS673YXBD2TPY2DRNANCNFSM4HSRWF2A .

-- Robert J. Harrison tel: 865-274-8544

ryanmrichard commented 5 years ago

I'd argue both of your examples fall under the jurisdiction of the lazy-evaluated tensor class. In practice the biggest objects we need to deal with are tensors and memory management of the tensor needs to reside with TAMM (and can only be with TAMM since it encapsulates the memory details).

robertjharrison commented 5 years ago

Yes, but TAMM in this context will be just another SDE module (or will be invoked by an SDE module) and hence SDE is still responsible for memory managment.

On Tue, Jun 4, 2019 at 12:08 PM Ryan Richard notifications@github.com wrote:

I'd argue both of your examples fall under the jurisdiction of the lazy-evaluated tensor class. In practice the biggest objects we need to deal with are tensors and memory management of the tensor needs to reside with TAMM (and can only be with TAMM since it encapsulates the memory details).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NWChemEx-Project/SDE/issues/82?email_source=notifications&email_token=ABZSAPJUXELL7SJUZPCQIGLPY2HOJA5CNFSM4HSRWF2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODW5CFPA#issuecomment-498737852, or mute the thread https://github.com/notifications/unsubscribe-auth/ABZSAPJFCL6R56H7ORRN6C3PY2HOJANCNFSM4HSRWF2A .

-- Robert J. Harrison tel: 865-274-8544

ryanmrichard commented 5 years ago

SDE is responsible for TAMM's memory in the sense that it has a shared_ptr to a tensor and there needs to be a way to release the handle SDE holds. In this sense it's no different than any other data type. What you were commenting on is not just releasing a handle, but how the object actually works. Right now TAMM has a direct and a core tensor (the former always computes elements on-the-fly the latter stores all elements). Your first example falls between these two extremes and your second is a performance consideration for using a lazy tensor; hence to me they're TAMM problems.

robertjharrison commented 5 years ago

Well, I would then say resource management is a shared responsibility of both TAMM and SDE.

I guess we need to loop Sriram and others into the conversation.

On Tue, Jun 4, 2019 at 1:20 PM Ryan Richard notifications@github.com wrote:

SDE is responsible for TAMM's memory in the sense that it has a shared_ptr to a tensor and there needs to be a way to release the handle SDE holds. In this sense it's no different than any other data type. What you were commenting on is not just releasing a handle, but how the object actually works. Right now TAMM has a direct and a core tensor (the former always computes elements on-the-fly the latter stores all elements). Your first example falls between these two extremes and your second is a performance consideration for using a lazy tensor; hence to me they're TAMM problems.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NWChemEx-Project/SDE/issues/82?email_source=notifications&email_token=ABZSAPPGIQEUYCEIUDWXWMDPY2P55A5CNFSM4HSRWF2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODW5IUJI#issuecomment-498764325, or mute the thread https://github.com/notifications/unsubscribe-auth/ABZSAPK34V2ZVZYZRUXKTVTPY2P55ANCNFSM4HSRWF2A .

-- Robert J. Harrison tel: 865-274-8544

ryanmrichard commented 2 years ago

Resource management now falls to PZ, and there are designs in place for many of the cache-ing issues brought up. The remainder of this discussion is pretty old and not terribly relevant to the current design.

NWChemEx / PluginPlay

Memory Management #82