Closed jebob closed 3 years ago
What particular data do you think should not be stored? I load some data so users can see what is in the file before choosing exactly which symbols to load.
It also sounds like you would like the ability to "unload" a symbol. I could imagine doing that with either an explicit GdxSymbol.unload method, or maybe some sort of __enter__
, __exit__
syntax ...
Currently, the Translator object stores the values of the loaded symbol.
The ability to unload a symbol would work.
gdxpds.read_gdx.Translator.dataframe does not appear to cache the dataframe separately in the Translator object, but it does return a copy. Is the copy problematic?
I added an unload feature: https://github.com/NREL/gdx-pandas/commit/bee08e1c256ba32e3d37d63e09819dc2f9cc7559. Does that fix the issue?
It does not, I think the problem here is more subtle and not as I originally described.
I think there is a memory leak, and will close this issue and make a minimum example.
In my use case, I am trying to pick out some small and large symbols from a very large (3GB) gdx. It is too large to use to_dataframes(). Reading the GDX with lazy load is super slow, so calling to_dataframe() many times is also slow. In the end I settled for creating a Translator object and reusing it, but I discovered that the results are cached, so my large symbols have duplicated dataframes when I do anything with them.
Perhaps we should not store the symbol state by default?