Closed asnaylor closed 5 years ago
Can you try running the unit tests on your source install? You need to have pytest
installed (you can get it from pip) and then you can run pytest -vv tests/
in the top directory of the repo. Hopefully that will tell us more clearly where the error is coming from.
Hmm, All 52 tests passed
I tested the build from source fast_carpenter
with the files and config from the fast_cms_tutorial and it works fine so it must just be this file structure in test_dataset.yml
i am using. It's strange because the Pypi version works fine with test_dataset.yml
but the build from source has an issue.
Hi @asnaylor. Do you think you could test this again with the latest version of carpenter (v0.13.0) or update your source install? I wonder if this might have been a consequence of a bug solved with PR #60, in which case this might have been solved by that as well.
Hi @benkrikler I tested the same files and config with the latest version from pip (v0.13.0) and now the error has changed to be pandas/core/groupby/groupby.py", line 3291, in _get_grouper raise KeyError(gpr) KeyError: 'singleScatters.s2Area_phd'
.
To fix that i then added into config a fast_carpenter.Define
stage where i apply this formula reduxS2: {reduce: 0, formula: singleScatters.s2Area_phd}
and instead i bin reduxS2
instead of singleScatters.s2Area_phd
and that works fine.
What's bizarre is that just binning singleScatters.s2Area_phd
(which is a vector of floats but always size 1) has worked previously but now doesn't.
Can you post the full traceback, so I can understand better where this comes from? I think the issue comes from a feature that was added in v0.12.0 which allows you to calculate variables directly in the binned dataframe stage without defining a variable first. My guess is that full stop in the branch name confuses the way the expression parsing is handled there.
Also, to be clear, defining reduxS2: {reduce: 0, formula: singleScatters.s2Area_phd}
isn't strictly the same thing since it's only looking t the first single scatter in each event, whereas binning on singleScatters.s2Area_phd
would (if it wasn't breaking) look at all single scatters in every event. However, if my suspicion is right, then producing a new variable without a full-stop in its name but with the full contents of the original variable should be a valid work around:
- singleScatters__s2Area_phd: singleScatters.s2Area_phd
Sure, he's the key error full traceback:
Traceback (most recent call last):
File "/opt/rh/python27/root/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/home/anaylor/.local/lib/python2.7/site-packages/mantichora/worker.py", line 27, in run
self._run_tasks()
File "/home/anaylor/.local/lib/python2.7/site-packages/mantichora/worker.py", line 47, in _run_tasks
result = task_func()
File "/home/anaylor/.local/lib/python2.7/site-packages/mantichora/main.py", line 18, in __call__
return self.task(*self.args, **self.kwargs)
File "/home/anaylor/.local/lib/python2.7/site-packages/alphatwirl/concurrently/CommunicationChannel.py", line 16, in __call__
return self.task(*self.args, **self.kwargs)
File "/home/anaylor/.local/lib/python2.7/site-packages/alphatwirl/loop/EventLoop.py", line 45, in __call__
self.reader.event(event)
File "/home/anaylor/.local/lib/python2.7/site-packages/alphatwirl/loop/ReaderComposite.py", line 43, in event
if reader.event(event) is False:
File "/home/anaylor/fast-carpenter-dev/lib/python2.7/site-packages/fast_carpenter/summary/binned_dataframe.py", line 195, in event
out_dimensions=self._out_bin_dims)
File "/home/anaylor/fast-carpenter-dev/lib/python2.7/site-packages/fast_carpenter/summary/binned_dataframe.py", line 240, in _bin_values
bins = data.groupby(final_bin_dims)
File "/home/anaylor/fast-carpenter-dev/lib/python2.7/site-packages/pandas/core/generic.py", line 7632, in groupby
observed=observed, **kwargs)
File "/home/anaylor/fast-carpenter-dev/lib/python2.7/site-packages/pandas/core/groupby/groupby.py", line 2110, in groupby
return klass(obj, by, **kwds)
File "/home/anaylor/fast-carpenter-dev/lib/python2.7/site-packages/pandas/core/groupby/groupby.py", line 360, in __init__
mutated=self.mutated)
File "/home/anaylor/fast-carpenter-dev/lib/python2.7/site-packages/pandas/core/groupby/grouper.py", line 578, in _get_grouper
raise KeyError(gpr)
Yeah using the formula isn't the same but as it's a vector of size one it's okay for this particular example, however when i add a new definition without the formula like you suggested above i got an IndexError:
Traceback (most recent call last):
File "/opt/rh/python27/root/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/home/anaylor/.local/lib/python2.7/site-packages/mantichora/worker.py", line 27, in run
self._run_tasks()
File "/home/anaylor/.local/lib/python2.7/site-packages/mantichora/worker.py", line 47, in _run_tasks
result = task_func()
File "/home/anaylor/.local/lib/python2.7/site-packages/mantichora/main.py", line 18, in __call__
return self.task(*self.args, **self.kwargs)
File "/home/anaylor/.local/lib/python2.7/site-packages/alphatwirl/concurrently/CommunicationChannel.py", line 16, in __call__
return self.task(*self.args, **self.kwargs)
File "/home/anaylor/.local/lib/python2.7/site-packages/alphatwirl/loop/EventLoop.py", line 45, in __call__
self.reader.event(event)
File "/home/anaylor/.local/lib/python2.7/site-packages/alphatwirl/loop/ReaderComposite.py", line 43, in event
if reader.event(event) is False:
File "/home/anaylor/fast-carpenter-dev/lib/python2.7/site-packages/fast_carpenter/summary/binned_dataframe.py", line 189, in event
data = chunk.tree.pandas.df(all_inputs)
File "/home/anaylor/fast-carpenter-dev/lib/python2.7/site-packages/fast_carpenter/masked_tree.py", line 27, in df
df = self._owner.tree.pandas.df(*args, **kwargs)
File "/home/anaylor/fast-carpenter-dev/lib/python2.7/site-packages/fast_carpenter/tree_wrapper.py", line 70, in df
df = self._owner.tree.pandas.df(*args, **kwargs)
File "/home/anaylor/.local/lib/python2.7/site-packages/uproot/_connect/_pandas.py", line 30, in df
return self._tree.arrays(branches=branches, outputtype=pandas.DataFrame, namedecode=namedecode, entrystart=entrystart, entrystop=entrystop, flatten=flatten, flatname=flatname, awkwardlib=awkwardlib, cache=cache, basketcache=basketcache, keycache=keycache, executor=executor, blocking=blocking)
File "/home/anaylor/fast-carpenter-dev/lib/python2.7/site-packages/fast_carpenter/tree_wrapper.py", line 53, in arrays
return self.tree.old_arrays(*args, **kwargs)
File "/home/anaylor/.local/lib/python2.7/site-packages/uproot/tree.py", line 484, in arrays
return wait()
File "/home/anaylor/.local/lib/python2.7/site-packages/uproot/tree.py", line 468, in wait
return uproot._connect._pandas.futures2df(futures, outputtype, entrystart, entrystop, flatten, flatname, awkward)
File "/home/anaylor/.local/lib/python2.7/site-packages/uproot/_connect/_pandas.py", line 192, in futures2df
indexes = awkward.JaggedArray(starts, stops, awkward.numpy.empty(stops[-1], dtype=object)).tojagged(indexes).content
File "/home/anaylor/.local/lib/python2.7/site-packages/awkward/array/jagged.py", line 780, in tojagged
content[good] = data[self.parents[good]]
IndexError: index 1690 is out of bounds for axis 0 with size 1676
On a clean virtualenv I was unable to execute a simple test yaml file when installing
fast_carpenter
from source. The yaml file works fine withfast_carpenter
installed from Pypi in this clean virtualenv:But when i install
fast_carpenter
from source via:I get this error message when trying to run
ValueError: cannot interpret branch 'singleScatters.' as a Python type
:These are my pip libraries: