djsutherland / pummeler

Utilities to analyze ACS PUMS files, especially for distribution regression / ecological inference
MIT License
21 stars 7 forks source link

Error when running `pummel featurize SORT_DIR` #21

Closed Lizette-Lemus closed 7 years ago

Lizette-Lemus commented 7 years ago

When I ran pummel featurize SORT_DIR I received the next error:

Picking bandwidth by median heuristic...Traceback (most recent call last):
  File "./pummel", line 5, in <module>
    main()
  File "/home/lizette/git_things/pummeler/pummeler/cli.py", line 197, in main
    args.func(args, parser)
  File "/home/lizette/git_things/pummeler/pummeler/cli.py", line 236, in do_featurize
    common_feats=args.common_feats)
  File "/home/lizette/git_things/pummeler/pummeler/featurize.py", line 217, in get_embeddings
    stats, skip_feats=skip_feats)
  File "/home/lizette/git_things/pummeler/pummeler/featurize.py", line 181, in pick_gaussian_bandwidth
    stats['sample'], stats, ret_df=False, skip_feats=skip_feats))
  File "/home/lizette/git_things/pummeler/pummeler/featurize.py", line 49, in get_dummies
    reals[:] -= stats['real_means'][real_feats]
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/ops.py", line 727, in wrapper
    dtype=dtype,
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/ops.py", line 635, in _construct_result
    return left._constructor(result, index=index, name=name, dtype=dtype)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/series.py", line 248, in __init__
    raise_cast_failure=True)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/series.py", line 3027, in _sanitize_array
    raise Exception('Data must be 1-dimensional')
Exception: Data must be 1-dimensional

I found a work around by adding .values to the code in lines 49 and 50 of featurize.py

   reals[:] -= stats['real_means'][real_feats].values
    reals[:] /= stats['real_stds'][real_feats].values

Is this a good way to fix this?

djsutherland commented 7 years ago

Huh, weird. I'll try to look into this later today.

djsutherland commented 7 years ago

It looks like you're probably using an old version of pandas, which likely caused the problem. That's a reasonable fix, which I've put into the new pummeler 0.2.3 release, but let me know if you hit other errors elsewhere....