cyclus / cymetric

http://fuelcycle.org/user/cymetric/index.html
Other
5 stars 20 forks source link

Memory Issues on Materials metric #177

Open abachma2 opened 3 years ago

abachma2 commented 3 years ago

I am using cymetic to analyze the SQLite output of a fairly large Cyclus simulation (the database is about 400 MB). When I use some of the metrics (like 'Materials' and any that rely on it like 'TransactionQuantity') I encounter a MemoryError; it can't allocate a certain amount of memory to an array of the specified size. Amounts of memory that it told me it can't allocate range between 571 MiB-3.91 GiB. I changed my setting to allow over commit memory, but doing this just leads the the kernel dying rather than returning a MemoryError.

I am running 64-bit python3 on a 64-bit Ubuntu 18.04 system with 32 GB of memory.

The error seems to stem from the pd.merge or set_index operation in the 'Materials' metric

---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-4-91d85b0e404a> in <module>()
----> 1 evaler.eval('Materials')

/home/amandabachmann/.local/lib/python3.6/site-packages/cymetric/evaluator.py in eval(self, metric, conds)
     58             frame = self.eval(dep, conds=conds)
     59             frames.append(frame)
---> 60         raw = m(frames=frames, conds=conds, known_tables=self.known_tables)
     61         if raw is None:
     62             return raw

/home/amandabachmann/.local/lib/python3.6/site-packages/cymetric/metrics.py in __call__(self, frames, conds, known_tables, *args, **kwargs)
     75             if self.name in known_tables:
     76                 return self.db.query(self.name, conds=conds)
---> 77             return f(*frames)
     78 
     79     Cls.__name__ = str(name)

/home/amandabachmann/.local/lib/python3.6/site-packages/cymetric/metrics.py in materials(rsrcs, comps)
    118     x = pd.merge(rsrcs, comps, on=['SimId', 'QualId'], how='inner')
    119     x = x.set_index(['SimId', 'QualId', 'ResourceId', 'ObjId', 'TimeCreated',
--> 120                      'NucId', 'Units'])
    121     y = x['Quantity'] * x['MassFrac']
    122     y.name = 'Mass'

/home/amandabachmann/anaconda3/envs/cyclus-env/lib/python3.6/site-packages/pandas/core/frame.py in set_index(self, keys, drop, append, inplace, verify_integrity)
   4607 
   4608         # clear up memory usage
-> 4609         index._cleanup()
   4610 
   4611         frame.index = index

/home/amandabachmann/anaconda3/envs/cyclus-env/lib/python3.6/site-packages/pandas/core/indexes/base.py in _cleanup(self)
    546 
    547     def _cleanup(self):
--> 548         self._engine.clear_mapping()
    549 
    550     @cache_readonly

pandas/_libs/properties.pyx in pandas._libs.properties.CachedProperty.__get__()

/home/amandabachmann/anaconda3/envs/cyclus-env/lib/python3.6/site-packages/pandas/core/indexes/multi.py in _engine(self)
   1000         if lev_bits[0] > 64:
   1001             # The levels would overflow a 64 bit uint - use Python integers:
-> 1002             return MultiIndexPyIntEngine(self.levels, self.codes, offsets)
   1003         return MultiIndexUIntEngine(self.levels, self.codes, offsets)
   1004 

pandas/_libs/index.pyx in pandas._libs.index.BaseMultiIndexCodesEngine.__init__()

MemoryError: Unable to allocate 3.91 GiB for an array with shape (74887355, 7) and data type int64