LINCellularNeuroscience / VAME

Variational Animal Motion Embedding - A tool for time series embedding and clustering
GNU General Public License v3.0
175 stars 58 forks source link

Error when working with vame.community() #62

Closed elenael97 closed 2 years ago

elenael97 commented 3 years ago

Hello,

I have been trying to work with vame.community() function, and I can produce the hierarchical tree of mouse behaviour, but afterwards I get an error, I was wondering if anyone could help me solve it?

In [5]: vame.community(config, show_umap=False, cut_tree=None) C:\Users\User\VAME\vame\analysis\community_analysis.py:57: RuntimeWarning: invalid value encountered in true_divide transition_matrix = adjacency_matrix/row_sum[:,np.newaxis] C:\Users\User\VAME\vame\analysis\tree_hierarchy.py:67: RuntimeWarning: divide by zero encountered in double_scalars cost = (motif_norm[i] + motif_norm[j]) / np.abs(transition_matrix[i,j] + transition_matrix[j,i] ) C:\Users\User\VAME\vame\analysis\tree_hierarchy.py:67: RuntimeWarning: invalid value encountered in double_scalars cost = (motif_norm[i] + motif_norm[j]) / np.abs(transition_matrix[i,j] + transition_matrix[j,i] ) Where do you want to cut the Tree? 0/1/2/3/...2 [[8, 11, 1, 5, 7, 9, 10, 12, 6, 4, 14, 13], [3, 2]]

Are all motifs in the list? (yes/no/restart)yes

IndexError Traceback (most recent call last)

in ----> 1 vame.community(config, show_umap=False, cut_tree=None) ~\VAME\vame\analysis\community_analysis.py in community(config, show_umap, cut_tree) 204 labels = get_labels(cfg, files, model_name, n_cluster) 205 transition_matrices = compute_transition_matrices(files, labels, n_cluster) --> 206 communities_all, trees = create_community_bag(files, labels, transition_matrices, cut_tree, n_cluster) 207 community_labels_all = get_community_labels(files, labels, communities_all) 208 ~\VAME\vame\analysis\community_analysis.py in create_community_bag(files, labels, transition_matrices, cut_tree, n_cluster) 84 for i, file in enumerate(files): 85 _, usage = np.unique(labels[i], return_counts=True) ---> 86 T = graph_to_tree(usage, transition_matrices[i], n_cluster, merge_sel=1) 87 trees.append(T) 88 ~\VAME\vame\analysis\tree_hierarchy.py in graph_to_tree(motif_usage, transition_matrix, n_cluster, merge_sel) 103 # max_tr = np.max(trans_mat_temp) #merge function 104 # nodes = np.where(max_tr == trans_mat_temp) --> 105 nodes = merge_func(trans_mat_temp, n_cluster, motif_norm_temp, merge_sel) 106 107 if np.size(nodes) >= 2: ~\VAME\vame\analysis\tree_hierarchy.py in merge_func(transition_matrix, n_cluster, motif_norm, merge_sel) 65 for j in range(n_cluster): 66 try: ---> 67 cost = (motif_norm[i] + motif_norm[j]) / np.abs(transition_matrix[i,j] + transition_matrix[j,i] ) 68 except ZeroDivisionError: 69 print("Error: Transition probabilities between motif "+str(i)+" and motif "+str(j)+ " are zero.") **IndexError: index 14 is out of bounds for axis 0 with size 14**

I receive the same error if I choose show_umap=True, which means I cannot view the UMAP.

AthiraDK commented 3 years ago

We are running into the same error when trying the community approach. We have 7 videos and the corresponding results (csv files) from DLC as inputs into VAME and we have 15 motifs (motif 0 to motif 14) as a result of training the LSTM and doing the pose segmentation steps. Motif occurrences in each of the video are as follows :

video 0: motifs [ 0 1 5 6 8 10 11] video 1: motifs [ 0 1 6 7 8 10 11 12] video 2: motifs [ 0 6 7 10 12] video 3: motifs [ 1 2 3 4 5 8 9 13 14] video 4: motifs [ 1 2 3 4 5 9 13 14] video 5: motifs [ 1 2 3 4 5 8 9 13 14] video 6: motifs [ 1 2 3 4 5 9 13 14]

Since some motifs are missing in each case, we ran the community function with cut_tree=None However, we get this error:

 IndexError                                Traceback (most recent call last)
<ipython-input-11-9e257901c003> in <module>
----> 1 vame.community(config, show_umap=False, cut_tree=0)

~/anaconda3/envs/VAME/lib/python3.7/site-packages/vame-1.0-py3.7.egg/vame/analysis/community_analysis.py in community(config, show_umap, cut_tree)
    204     labels = get_labels(cfg, files, model_name, n_cluster)
    205     transition_matrices = compute_transition_matrices(files, labels, n_cluster)
--> 206     communities_all, trees = create_community_bag(files, labels, transition_matrices, cut_tree, n_cluster)
    207     community_labels_all = get_community_labels(files, labels, communities_all)
    208 

~/anaconda3/envs/VAME/lib/python3.7/site-packages/vame-1.0-py3.7.egg/vame/analysis/community_analysis.py in create_community_bag(files, labels, transition_matrices, cut_tree, n_cluster)
     84     for i, file in enumerate(files):
     85         _, usage = np.unique(labels[i], return_counts=True)
---> 86         T = graph_to_tree(usage, transition_matrices[i], n_cluster, merge_sel=1)
     87         trees.append(T)
     88 

~/anaconda3/envs/VAME/lib/python3.7/site-packages/vame-1.0-py3.7.egg/vame/analysis/tree_hierarchy.py in graph_to_tree(motif_usage, transition_matrix, n_cluster, merge_sel)
    103 #        max_tr = np.max(trans_mat_temp) #merge function
    104 #        nodes = np.where(max_tr == trans_mat_temp)
--> 105         nodes = merge_func(trans_mat_temp, n_cluster, motif_norm_temp, merge_sel)
    106 
    107         if np.size(nodes) >= 2:

~/anaconda3/envs/VAME/lib/python3.7/site-packages/vame-1.0-py3.7.egg/vame/analysis/tree_hierarchy.py in merge_func(transition_matrix, n_cluster, motif_norm, merge_sel)
     65             for j in range(n_cluster):
     66                 try:
---> 67                     cost = (motif_norm[i] + motif_norm[j]) / np.abs(transition_matrix[i,j] + transition_matrix[j,i] )
     68                 except ZeroDivisionError:
     69                     print("Error: Transition probabilities between motif "+str(i)+" and motif "+str(j)+ " are zero.")

IndexError: index 7 is out of bounds for axis 0 with size 7
AthiraDK commented 3 years ago

In the VAME codebase, vame/analysis/tree_hierarchy.py, the argument motif_norm passed into the function merge_func seems to have length < n_clusters, and this seems to throw out the error. motif_norm is calculated from a usage variable which is a count of how many occurrences of each of the unique motifs is found in each of the videos. Whenever certain motifs are missing from video, the length of the usage array is shorter than the 'n_clusters'. For example, if video number 2 in my dataset has motifs video 2: motifs [ 0 6 7 10 12], usage is an numpy array like [1930, 2237, 2152, 1737, 941] where each element corresponds to the count in each of the 5 motifs seen in the video out of the 15 motifs. motif_norm is in essence calculated as follows ( changed some variable numbers for debugging):

motif_usage_temp_colsum = usage.sum(axis=0)
motif_norm = usage/motif_usage_temp_colsum
motif_norm_temp = motif_norm.copy()
motif_norm

Thus motif_norm will have same length as of usage, i.e, 5 is my example. Now, inside the merge_func, in the for loop, motif_norm[i] + motif_norm[j] where i and j are in the range 0 to n_clusters, a 5 element array is indexed with indices 0 to 14, which is what is seeming to throw the error.

PatrickHonma commented 2 years ago

@AthiraDK I am also running into the same error when working with the community() function. Did you find a fix for this? I also noticed that motif_norm is Is motif_norm a 1D array with size of ncluster.

AthiraDK commented 2 years ago

@PatrickHonma I have not yet found a fix for this. Do you also have a case where motifs are unevenly distributed over the videos? As in, not all motifs are present in each of your videos?

MannyEsguerra commented 2 years ago

@AthiraDK I am getting the same indexing error with vame.community(), and can confirm that not all motifs are found in all my videos. Tried cut_tree = None|0|1|3.

Anaconda environment in Windows 10.

PatrickHonma commented 2 years ago

@AthiraDK Not all motifs are present in each video, but they are appearing as 0's in the motif_usage.

I did notice that the cost function in line 67 of tree_hierarchy.py is expecting [i,j] dimensions for motif_norm. I added a new dimension to motif_usage_temp by editing: motif_usage_temp = motif_usage[:,np.newaxis] Now motif_norm has shape [n_cluster, 1], which I believe fixes the the index error.

alexcwsmith commented 2 years ago

@AthiraDK for the 'Index 7 is out of bounds for axis 0 with size 7' that should have been fixed in my pull request: https://github.com/LINCellularNeuroscience/VAME/pull/58 which ensures that all clusters are accounted for and that 'empty' clusters have a 0 instead of shortening the length of the axis. So update your vame install since that PR was merged a while back, and @PatrickHonma mentions there are 0s in the motif_usage array for him. It is noteworthy that not only did this cause that error, it also was causing data to be inaccurate because all of the clusters above the empty one were being shifted down one number because a 0 wasn't being inserted into the list for an empty cluster (so motif 8 would be stored as motif 7, 9 as 8, etc).

I would clone this repository and try importing from there instead of the version installed from pip. Personally I would recommend cloning my fork of this repository but there are some other features that are different that you may or may not like.

cfernandezpa commented 2 years ago

Hello,

I recently started using VAME and did a couple of tests using just one videofile and it seemd to be working fine. Now, I've followed the exact same steps but this time using 6 files. Everything was going well until I found the error "index 29 is out of bounds for axis 0 with size 29" when running vame.community(). I've replaced my files with the more recent ones from https://github.com/LINCellularNeuroscience/VAME/tree/master/vame/analysis but that does not fix the issue.

Here it is the code with the error I got:

In [3]: vame.community(config, show_umap=True, cut_tree=None) C:\Users\oasis\anaconda3\envs\vame\lib\site-packages\vame-1.0-py3.7.egg\vame\analysis\community_analysis.py:57: RuntimeWarning: invalid value encountered in true_divide C:\Users\oasis\anaconda3\envs\vame\lib\site-packages\vame-1.0-py3.7.egg\vame\analysis\tree_hierarchy.py:67: RuntimeWarning: divide by zero encountered in double_scalars

IndexError Traceback (most recent call last)

in ----> 1 vame.community(config, show_umap=True, cut_tree=None) ~\anaconda3\envs\vame\lib\site-packages\vame-1.0-py3.7.egg\vame\analysis\community_analysis.py in community(config, show_umap, cut_tree) 204 labels = get_labels(cfg, files, model_name, n_cluster) 205 transition_matrices = compute_transition_matrices(files, labels, n_cluster) --> 206 communities_all, trees = create_community_bag(files, labels, transition_matrices, cut_tree, n_cluster) 207 community_labels_all = get_community_labels(files, labels, communities_all) 208 ~\anaconda3\envs\vame\lib\site-packages\vame-1.0-py3.7.egg\vame\analysis\community_analysis.py in create_community_bag(files, labels, transition_matrices, cut_tree, n_cluster) 84 for i, file in enumerate(files): 85 _, usage = np.unique(labels[i], return_counts=True) ---> 86 T = graph_to_tree(usage, transition_matrices[i], n_cluster, merge_sel=1) 87 trees.append(T) 88 ~\anaconda3\envs\vame\lib\site-packages\vame-1.0-py3.7.egg\vame\analysis\tree_hierarchy.py in graph_to_tree(motif_usage, transition_matrix, n_cluster, merge_sel) 103 # max_tr = np.max(trans_mat_temp) #merge function 104 # nodes = np.where(max_tr == trans_mat_temp) --> 105 nodes = merge_func(trans_mat_temp, n_cluster, motif_norm_temp, merge_sel) 106 107 if np.size(nodes) >= 2: ~\anaconda3\envs\vame\lib\site-packages\vame-1.0-py3.7.egg\vame\analysis\tree_hierarchy.py in merge_func(transition_matrix, n_cluster, motif_norm, merge_sel) 65 for j in range(n_cluster): 66 try: ---> 67 cost = (motif_norm[i] + motif_norm[j]) / np.abs(transition_matrix[i,j] + transition_matrix[j,i] ) 68 except ZeroDivisionError: 69 print("Error: Transition probabilities between motif "+str(i)+" and motif "+str(j)+ " are zero.") IndexError: index 29 is out of bounds for axis 0 with size 29` I would really appreciate any help! Best wishes, Carlos
alisabak commented 2 years ago

I am getting the same error as @cfernandezpa and @AthiraDK, though I use the latest version of files. Could anyone find a working solution? Thanks!

kvnlxm commented 2 years ago

Hi everyone, thank you again for bringing this up. We will try to update a newer version of the vame.community() functionality within the next months. I know this is a persisting issue and I was working with other groups on some solutions to this. The tree as well as the community functionality could be overall improved and I am happy for any further ideas from the VAME community in this direction. For now, I will close this issue and hope the newer version will solve some of the outlined problems.

Cheers, Kevin

ZhanqiZhang66 commented 2 years ago

@AthiraDK I had the same issue here and here is what I did. A naive fix for this issue is to add a new parameterk_labels, which is for example, [ 0 6 7 10 12], to merge_func. To do this, you need to create this k_labelsas a parameter in community in file community_analysis.py

 k_labels, usage = np.unique(labels[idx], return_counts=True)
        T = graph_to_tree(usage, k_labels, transition_matrices[idx], n_cluster, merge_sel=1)
        trees.append(T)

Then, in tree_hierarchy.py, I modified grah_to_treeas def graph_to_tree(motif_usage, k_labels, transition_matrix, n_cluster, merge_sel=1) (basically just feeding in k_labels to it so that I can use it in merge_func)

And then inside grah_to_tree, I fed k_labels to merge_fuc

nodes = merge_func(trans_mat_temp, k_labels, n_cluster, motif_norm_temp, merge_sel)

Here is my merge_func

 if merge_sel == 1:
        count = 0
        cost_temp = 100
        for i in range(len(k_labels)):
            for j in range(len(k_labels)):
                if np.abs(transition_matrix[i,j] + transition_matrix[j,i] ) == 0:
                    cost = 1000
                    count += 1
                else:
                    cost = (motif_norm[i] + motif_norm[j]) / np.abs(transition_matrix[i, j] + transition_matrix[j, i])

Now thetransition_matrix[i, j] would be the transition probability between the valid motifs.

hummuscience commented 2 years ago

@ZhanqiZhang66 I am trying to recreate your fix, but I cannot follow it completely. Could you upload your community_analisys.py and tree_hierarchy.py files somewhere? Thanks in advance!

cassandrarbk commented 9 months ago

@ZhanqiZhang66 I am still trying to fix the problem, but also can’t follow your fix completely. Would you be able to help me out?