Closed togawa28 closed 8 years ago
@togawa28 What if we only calculate two of the features, like area of rectangle and radius, but only have one function that returns both? I think that could work and testing would be significantly less? What do y'all think, @jcpalumbo3 @jJasonWang @aksam-ahmad @sicihuang ?
Interesting idea @alannaiverson, we could combine some functions, but why do you think testing would be less? Regardless of using one or several, we need to test the inputs/outputs of area of rectangle and radius anyway?
True, @togawa28 but I was thinking if we just computed the edge points and center within the function, those two at least wouldn't need separate testing would they? Or am I misunderstanding?
No, you correctly understand what I'm saying. Well, even if we include those two within the function, we still should test those so that we confirm they work correctly maybe?
I think testing the final output would serve as a check for those, without formal tests, but I could be wrong. I was thinking something like this (not sure why markdown is making the first few lines weird...): ` def compute_advanced(path_obj): if not isinstance(path_obj, pd.core.frame.DataFrame): raise TypeError("path_obj must be pandas DataFrame")
if not set(path_obj.keys()).issuperset(['x', 'y']):
raise ValueError("the keys of path_obj must contain 'x', 'y'")
if len(path_obj) <= 1:
raise ValueError("path_obj must contain at least 2 rows")
# Computes edge points like find_edge_points originally did compute_area_rectangle did
edge_points = {'xmin': np.min(path_obj.x), 'xmax': np.max(path_obj.x),
'ymin': np.min(path_obj.y), 'ymax': np.max(path_obj.y)}
# Computes area of rectangle
area = (edge_points['xmax']-edge_points['xmin']) * (edge_points['ymax']-edge_points['ymin'])
# Computes center like find_center originally did
center = {'x': (edge_points['xmin'] + edge_points['xmax'])/2,
'y': (edge_points['ymin'] + edge_points['ymax'])/2}
indices = path_obj.index[:-1]
vectors = [path_obj.loc[i, 'x':'y'] -
[center['x'], center['y']] for i in indices]
center_angles = [path_features.angle_between(list(v1), list(v2)) for
v1, v2 in zip(vectors[1:], vectors[:len(vectors)])]
radius = [np.linalg.norm(v) for v in vectors]
return({'radius': radius, 'center_angles': center_angles, 'area': area})`
@togawa28 @jJasonWang @jcpalumbo3 @sicihuang @aksam-ahmad Thoughts on squashing some of the functions into one for time reasons?
@alannaiverson thank you for your contribution, and now I get your point. It's much more efficient to squash those functions because it not only decreases the number of tedious testing functions but also reduces doc strings to make the whole file easier to read and understand. @jJasonWang @aksam-ahmad @sicihuang @jcpalumbo3 do you guys have any comments? If it's fine I'll follow @alannaiverson's suggestion and probably change almost whole contents of the file.
By the way, for PEP8 styles issue, I got the following message.
____ pyflakes-check ____ /home/travis/build/berkeley-stat222/mousestyles/mousestyles/path_diversity/path_features_advanced.py:84: EOL while scanning string literal raise ValueError("the keys of edge_points must be 'xmax', 'xmin', ^
I'm not sure how to fix it. Any idea?
I am not sure what happens, but I will help you check that @togawa28.
@togawa28 I made an example of what the function and tests might look like and put them here (it wouldn't let me upload .py files). Apologies for the messiness and these are just ideas! They can no doubt be improved! alanna_function.txt alanna_test.txt
I added new temporary files which are just copies of @alannaiverson's comment. We can discuss on those new files now.
Guys, I updated path_features_advanced.py and newly added tests/test_path_features_advanced.py, based on our discussions above. I intentionally didn't make any changes in @alannaiverson's file for comparison purpose. Main updates are:
@jarrodmillman could you help us to pass Travis CI? Basically our files passed the test in Python 3, but it didn't in python 2. We don't know why and are just wondering if you could find any good recommendation for this.
@togawa28 I am looking into it.
Rebased and fixed here #203
The results of exploring plots for features we have now (distance, speeds,...) were somewhat disappointing; probably we need to add more features which should indicate significant and interesting differences over different strains. This pull request will add the features such as:
For the initial commit, I only added main functions to generate those features. I'll add test functions later. Please review @aksam-ahmad @alannaiverson @jcpalumbo3 @jJasonWang @sicihuang