LINCellularNeuroscience / VAME

Variational Animal Motion Embedding - A tool for time series embedding and clustering
GNU General Public License v3.0
175 stars 58 forks source link

vame.create_trainset() doesnt accept output from csv_to_npy #68

Closed stowerslab closed 2 years ago

stowerslab commented 2 years ago

This issue seems to be separate from #67 and I am not sure what is going on here. After identifying a bodypoint that was problematic and throwing that out and rebuilding the npy using vame.csv_to_npy (and modifying the num_features in config.yaml, Im still getting vame.create_trainset() errors as shown below. The npy file itself seems fine (theres 26 "columns" for 26 features from 13 bodyparts). Is the interpolation a problem if there are too many NANs in a row?

edit just calculated the average likelihood and most of the points are in the 95-99% range, with 3 body points in the 75% to 86% range. I could go back and edit my local version of csv_to_npy in #67 so I can play around with the pose confidence if that is the most likely reason why the array is empty errors are popping up

Creating training dataset...
Using robust setting to eliminate outliers! IQR factor: 4
z-scoring of file side2021-06-18T12_34_14_compresseddownsampled
IQR value: nan, IQR cutoff: nan
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_14524/2932017249.py in <module>
      1 #change number of features in config.yaml. it should be number of body parts x2 (for both x and y so 7 body parts = 14)
----> 2 vame.create_trainset(config)

~\Anaconda3\envs\VAME\lib\site-packages\vame-1.0-py3.7.egg\vame\model\create_training.py in create_trainset(config)
    184 
    185     if legacy == False:
--> 186         traindata(cfg, files, cfg['test_fraction'], cfg['num_features'], cfg['savgol_filter'])
    187     else:
    188         traindata_legacy(cfg, files, cfg['test_fraction'], cfg['num_features'], cfg['savgol_filter'])

~\Anaconda3\envs\VAME\lib\site-packages\vame-1.0-py3.7.egg\vame\model\create_training.py in traindata(cfg, files, testfraction, num_features, savgol_filter)
     57                         X_z[i,marker] = np.nan
     58 
---> 59                 X_z[i,:] = interpol(X_z[i,:])
     60 
     61         X_len = len(data.T)

~\Anaconda3\envs\VAME\lib\site-packages\vame-1.0-py3.7.egg\vame\model\create_training.py in interpol(arr)
     28     y = np.transpose(arr)
     29     nans, x = nan_helper(y)
---> 30     y[nans]= np.interp(x(nans), x(~nans), y[~nans])
     31     arr = np.transpose(y)
     32     return arr

<__array_function__ internals> in interp(*args, **kwargs)

~\Anaconda3\envs\VAME\lib\site-packages\numpy\lib\function_base.py in interp(x, xp, fp, left, right, period)
   1426         fp = np.concatenate((fp[-1:], fp, fp[0:1]))
   1427 
-> 1428     return interp_func(x, xp, fp, left, right)
   1429 
   1430 

ValueError: array of sample points is empty

Setting robust: false throws a different error, but now it seems to load the 2nd file as well

Creating training dataset...
z-scoring of file side2021-06-18T12_34_14_compresseddownsampled
z-scoring of file side2021-07-01T13_27_19_compresseddownsampled
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_14524/2932017249.py in <module>
      1 #change number of features in config.yaml. it should be number of body parts x2 (for both x and y so 7 body parts = 14)
----> 2 vame.create_trainset(config)

~\Anaconda3\envs\VAME\lib\site-packages\vame-1.0-py3.7.egg\vame\model\create_training.py in create_trainset(config)
    184 
    185     if legacy == False:
--> 186         traindata(cfg, files, cfg['test_fraction'], cfg['num_features'], cfg['savgol_filter'])
    187     else:
    188         traindata_legacy(cfg, files, cfg['test_fraction'], cfg['num_features'], cfg['savgol_filter'])

~\Anaconda3\envs\VAME\lib\site-packages\vame-1.0-py3.7.egg\vame\model\create_training.py in traindata(cfg, files, testfraction, num_features, savgol_filter)
     75 
     76     else:
---> 77         anchor_1_temp = int(np.where(detect_anchors == sort_anchors[0])[0])
     78         anchor_2_temp = int(np.where(detect_anchors == sort_anchors[1])[0])
     79 

TypeError: only size-1 arrays can be converted to Python scalars
stowerslab commented 2 years ago

After investigating the problem further, it appears that csv_to_npy output isnt accepted by vame.create_trainset() because it sets values below the pose confidence to NaNs.

After fixing the issue in #67 locally, setting pose_confidence to 0 doesnt produce any NaNs in the generated NPY which vame.create_trainset() happily accepts. setting pose_confidence to any other value generates NaNs which will cause vame.create_trainset() to throw empty sample array errors.

this function may have been not updated to match the output of egocentric alignment (which im guessing is the function that is most often used before vame.create_trainset() ), so besides interpolating the NaNs, i would check if there are other things that need to be added

kvnlxm commented 2 years ago

Thank you for posting this! We updated the csv_to_numpy.py function and hope that this issue is resolved. But we will extensively check this with the newer version of VAME in a few months.

Cheers, Kevin