berkeley-stat222 / mousestyles

2016 final project
http://berkeley-stat222.github.io/mousestyles/
BSD 2-Clause "Simplified" License
2 stars 33 forks source link

WIP: add advanced features of paths. #179

Closed togawa28 closed 8 years ago

togawa28 commented 8 years ago

The results of exploring plots for features we have now (distance, speeds,...) were somewhat disappointing; probably we need to add more features which should indicate significant and interesting differences over different strains. This pull request will add the features such as:

  1. area of the rectangle spanned by path
  2. radius and center angles in path
  3. area covered by path
  4. distance between the start and end points in path For details, see #122.

For the initial commit, I only added main functions to generate those features. I'll add test functions later. Please review @aksam-ahmad @alannaiverson @jcpalumbo3 @jJasonWang @sicihuang

alannaiverson commented 8 years ago

@togawa28 What if we only calculate two of the features, like area of rectangle and radius, but only have one function that returns both? I think that could work and testing would be significantly less? What do y'all think, @jcpalumbo3 @jJasonWang @aksam-ahmad @sicihuang ?

togawa28 commented 8 years ago

Interesting idea @alannaiverson, we could combine some functions, but why do you think testing would be less? Regardless of using one or several, we need to test the inputs/outputs of area of rectangle and radius anyway?

alannaiverson commented 8 years ago

True, @togawa28 but I was thinking if we just computed the edge points and center within the function, those two at least wouldn't need separate testing would they? Or am I misunderstanding?

togawa28 commented 8 years ago

No, you correctly understand what I'm saying. Well, even if we include those two within the function, we still should test those so that we confirm they work correctly maybe?

alannaiverson commented 8 years ago

I think testing the final output would serve as a check for those, without formal tests, but I could be wrong. I was thinking something like this (not sure why markdown is making the first few lines weird...): ` def compute_advanced(path_obj): if not isinstance(path_obj, pd.core.frame.DataFrame): raise TypeError("path_obj must be pandas DataFrame")

if not set(path_obj.keys()).issuperset(['x', 'y']):
    raise ValueError("the keys of path_obj must contain 'x', 'y'")

if len(path_obj) <= 1:
    raise ValueError("path_obj must contain at least 2 rows")

# Computes edge points like find_edge_points originally did compute_area_rectangle did
edge_points = {'xmin': np.min(path_obj.x), 'xmax': np.max(path_obj.x),
               'ymin': np.min(path_obj.y), 'ymax': np.max(path_obj.y)}

# Computes area of rectangle
area = (edge_points['xmax']-edge_points['xmin']) * (edge_points['ymax']-edge_points['ymin'])

# Computes center like find_center originally did
center = {'x': (edge_points['xmin'] + edge_points['xmax'])/2,
          'y': (edge_points['ymin'] + edge_points['ymax'])/2}

indices = path_obj.index[:-1]
vectors = [path_obj.loc[i, 'x':'y'] -
           [center['x'], center['y']] for i in indices]
center_angles = [path_features.angle_between(list(v1), list(v2)) for
                 v1, v2 in zip(vectors[1:], vectors[:len(vectors)])]
radius = [np.linalg.norm(v) for v in vectors]

return({'radius': radius, 'center_angles': center_angles, 'area': area})`

@togawa28 @jJasonWang @jcpalumbo3 @sicihuang @aksam-ahmad Thoughts on squashing some of the functions into one for time reasons?

togawa28 commented 8 years ago

@alannaiverson thank you for your contribution, and now I get your point. It's much more efficient to squash those functions because it not only decreases the number of tedious testing functions but also reduces doc strings to make the whole file easier to read and understand. @jJasonWang @aksam-ahmad @sicihuang @jcpalumbo3 do you guys have any comments? If it's fine I'll follow @alannaiverson's suggestion and probably change almost whole contents of the file.

togawa28 commented 8 years ago

By the way, for PEP8 styles issue, I got the following message.

____ pyflakes-check ____ /home/travis/build/berkeley-stat222/mousestyles/mousestyles/path_diversity/path_features_advanced.py:84: EOL while scanning string literal raise ValueError("the keys of edge_points must be 'xmax', 'xmin', ^

I'm not sure how to fix it. Any idea?

jJasonWang commented 8 years ago

I am not sure what happens, but I will help you check that @togawa28.

alannaiverson commented 8 years ago

@togawa28 I made an example of what the function and tests might look like and put them here (it wouldn't let me upload .py files). Apologies for the messiness and these are just ideas! They can no doubt be improved! alanna_function.txt alanna_test.txt

togawa28 commented 8 years ago

I added new temporary files which are just copies of @alannaiverson's comment. We can discuss on those new files now.

togawa28 commented 8 years ago

Guys, I updated path_features_advanced.py and newly added tests/test_path_features_advanced.py, based on our discussions above. I intentionally didn't make any changes in @alannaiverson's file for comparison purpose. Main updates are:

  1. main functions returns all features (i.e. includes center angles, area covered by path.)
  2. radius and center angles are returned as a list (no data reduction): we could take sum or average outside of the function.
  3. modified tests for outputs. Your any comments (especially on the updates above & styles issue) will be appreciated.
togawa28 commented 8 years ago

@jarrodmillman could you help us to pass Travis CI? Basically our files passed the test in Python 3, but it didn't in python 2. We don't know why and are just wondering if you could find any good recommendation for this.

jarrodmillman commented 8 years ago

@togawa28 I am looking into it.

jarrodmillman commented 8 years ago

Rebased and fixed here #203