mikofski commented 8 years ago

observed: large systems take up too much system memory

proposed: don't make separate copies of objects until absolutely necessary.

instead of making copies and tracking all objects as individuals, just reference the same copy of identical objects. for example: when making a module, rather than copying the cell 96 or 128 times, just reference the same cell until there are differences. Ditto for modules - if all the modules in a string are identical, just reference the same module rather than making cheap copies until needed. in other words only create as many objects, the minimum, as needed to describe the characteristics and condition of the system.

issues:

~requires caching of the references somewhere. for example: the cells that reference the cell object could be cached in the modules in a cached_cells field as a dictionary.~ Instead, an easier way is to just keep the lists the same, and put the reference you want in there. see my 1st comment below
requires tracking of cell changes in object (eg cell) characteristics like temperature, shunt and series resistance, etc.

also profiling to demonstrate this is an improvement over current methods is required.

mikofski commented 7 years ago

More details and examples:

caching references

Currently in a module, there is a pvcell member that is a list of the cells in that module. The cell number is identified by cell position and its index in the list. If the module has 96 cells, then the list is 96 items long, and each item is a unique reference created by using copy.copy.

caching options

One option is very simple, instead of creating cheap copies, when creating the module, just use the same reference. This is the simplest and quickest implementation since it requires the least amount of code changes. Change PVmodule:

        # only single pvcell instead of a list of pvcells
        if isinstance(pvcells, PVcell):
            # do NOT use copy, instead all indices point to the same pvcell reference!
            pvcells = [pvcells] * self.numberCells

Another option, which may be helpful, or may actually be worse would be to add a new field to each object that maps cell indices to the references in the list, and make each cell reference unique. This requires a little more book keeping, so I'm starting to think it's a BAD idea, but I'll list it here anyway. Change PVmodule:

        # only single pvcell instead of a list of pvcells
        if isinstance(pvcells, PVcell):
            # do NOT use copy, instead make one cell!
            pvcells = [pvcells]
        cached_cells = [0] * self.numberCells  # map that points to the cell index
        # to get the desired cell use: pvc = pvmod.pvcells[pvmod.cached_cells[idx]]
        # might be able to overload __get_item__ or something, or make a custom get_pvcell() method
        # this method seems BAD to me now

bmeyers commented 7 years ago

I have implemented a solution at the PVmodule level. My solution is:

Keep a cash of unique PVcell objects for each PVmodule instance
Add the ability of PVcell objects to output a dictionary that fully defines the cell
Anytime a cell's irradiance gets changed, check to see if the new cell definition already exists in the list of cashed cells by comparing the dictionaries
If it already exists, put in a pointer to that existing cell
If it doesn't exist, create a copy of the old cell, change the irradiance, add the new cell to the cash list, and set a pointer to that cell

I would like to implement a similar scheme at the string and system level before making a pull request and closing this issue.

bmeyers commented 7 years ago

You can see my progress by looking at the better_memory branch of my fork

mikofski commented 7 years ago

can you open a PR to pvmismatch? you may need to make a fresh clone your entire repo, delete the github version, then fork pvmismatch again and finally push your clone back up to the new fork.

bmeyers commented 7 years ago

Already fixed the fork (my repo points to the main one again). I'll open a PR now and reference this issue. All tests passed, by the way.

mikofski commented 7 years ago

@bmeyers I don't think a lot of this is necessary,see this comment

Did you get my email regarding this? I would like to adhere to a "light" ethic if possible: ie: the smallest changes possible and delegate as much as possible. Is that OK with you? IMO I think we can get the desired memory optimization with very few changes. Then let's see how things work out before make more changes.

mikofski commented 7 years ago

@bmeyers re: why I would want to keep the changes simple and not introduce too many new methods and class attributes. Too many changes at once makes the code difficult to comprehend. PVMismatch has IMO an intuitive structure. What we're doing now is a little fancy, so if we can make it easy, then it will be easier to maintain and make other changes in the future. This ethic had really paid off well for past changes to PVMismatch like the restructure we did for changing pvcell temperature and other cell properties and when I did the Tetris changes. Big picture, small changes.

bmeyers commented 7 years ago

@mikofski the philosophy makes perfect sense to me. I'll review the email you sent more closely before continuing on. And I'll definitely plan on moving the "memory optimizer" code into contrib.

mikofski commented 7 years ago

@bmeyers have a look at this example - only 3 one-line changes resolve issue #35

>>> from pvmismatch import *
>>> pvsys = pvsystem.PVsystem()
>>> pvsys.pvstrs
[<pvmismatch.pvmismatch_lib.pvstring.PVstring at 0x7fb0a384be50>,
 <pvmismatch.pvmismatch_lib.pvstring.PVstring at 0x7fb0a384be50>,
 <pvmismatch.pvmismatch_lib.pvstring.PVstring at 0x7fb0a384be50>,
...]
>>> pvsys.pvstrs
[<pvmismatch.pvmismatch_lib.pvstring.PVstring at 0x7fb0a384be50>,
 <pvmismatch.pvmismatch_lib.pvstring.PVstring at 0x7fb0a384be50>,
 <pvmismatch.pvmismatch_lib.pvstring.PVstring at 0x7fb0a384be50>,
...]
[id(pvc) for pvc in pvsys.pvmods[0][0].pvcells]
[140396627329168,
 140396627329168,
 140396627329168,
 140396627329168,
...]
>>> pvsys.plotSys()

notice that the pvstring and module hashes are the same for all strings, all modules, etc. ditto for cells

bmeyers commented 7 years ago

well sure, and that's exactly what I did for PVmodule. But then the tests fail because setSuns doesn't work correctly. Which is what i was working on the rest of the day.

mikofski commented 7 years ago

see my next 3 commits, that fixes setSuns() or issue #34 - I think that this could be implemented a couple different ways so feel free to experiment around with the best way. I really wanted to figure out a way to use set(list of objects) to get the unique objects, but I ended up using the "cache" idea, but just simpler, turns out that dict.fromkeys(list of objects) does the same thing as set(list of objects) just instead of a frozen set, you get a dictionary of unique objects as keys, and values as None, useful so I could store the new copy as the value, and use the initial None state to know whether to copy or not.

I made some tests, but there not exhaustive, and I bet there are still some edge cases that still need fixin.

also I noticed in pvstring.py that there's already a sneaky way to create a list of pvcells from a single irradiance by passing in a scalar as Ee and a list as cell indices - so my ideas for this in #34 are probably unnecessary.

SunPower / PVMismatch

don't keep copies of identical objects (eg cells) #35

caching references

caching options