Closed tianlt closed 2 years ago
hi,
i think the issue here is that in the second iteration you are getting two disconnected components and I'm not sure if this is also the case in the first iteration. If the first pass (v_1) also yields two disconnected components, then I would suggest only running a single pass as it is not so easy to carry forward the terminal states from the first iteration onto the second iteration. If you mainly just want more granularity then you can lower the too_big_factor parameter already in v_1 (set it to 0.2 or 0.15, instead of 0.3) and see if you get a more fine-grained trajectory (or even multiple disconnected trajectories). You might also consider using more than just the first two PCs for your input as VIA can handle higher number of input dimensions.
Thank you for reply and explanation. I tuned both the too_big_factor and the number of PCs as input and only use the first pass. It runs successfully on one dataset. But on another dataset i got another error; IndexError: color index must be non-negative.
The full error and output I got are;
IndexError Traceback (most recent call last)
/tmp/ipykernel_29627/4051984019.py in
/tmp/ipykernel_29627/4051984019.py in runvia_simulated_data(name) 27 is_coarse=True, preserve_disconnected=True, name=name) 28 ---> 29 v_1.run_VIA() 30 31 #
~/viaenv/lib/python3.7/site-packages/pyVIA/core.py in run_VIA(self) 3413 # Query dataset, k - number of closest elements (returns 2 numpy arrays) 3414 -> 3415 self.run_subPARC() 3416 run_time = time.time() - time_start_total 3417 print('time elapsed {:.1f} seconds'.format(run_time))
~/viaenv/lib/python3.7/site-packages/pyVIA/core.py in run_subPARC(self) 3051 3052 for i in scaled_hitting_times: -> 3053 all_colors.append(pal.get(int(i))[0:3]) 3054 3055 locallytrimmed_g.vs['hitting_times'] = scaled_hitting_times
~/viaenv/lib/python3.7/site-packages/igraph/drawing/colors.py in get(self, v) 87 if isinstance(v, int): 88 if v < 0: ---> 89 raise IndexError("color index must be non-negative") 90 if v >= self._length: 91 raise IndexError("color index too large")
IndexError: color index must be non-negative
have you tried leaving the too_big_factor as 0.3 for this second dataset? or changing the graph pruning (jac_std_global=0.15, to a higher value of e.g. 1.0) to make sure that you are not over fragmenting the graph and getting a component with just a single cluster, which might be causing some division by zero? i havent encountered this before, so am trying to guess what might be causing this behaviour
Yes and I tried to set the jac_std_global=1.0 but get another error: ValueError: dimension mismatch
ValueError Traceback (most recent call last)
/tmp/ipykernel_53985/4111017426.py in <module>
39 return
40
---> 41 runvia_simulated_data('bc_9')
42 #runvia_simulated_data('bt_5')
43 #runvia_simulated_data('l_0')
/tmp/ipykernel_53985/4111017426.py in runvia_simulated_data(name)
28 is_coarse=True, preserve_disconnected=True, name=name)
29
---> 30 v_1.run_VIA()
31
32 #
~/viaenv/lib/python3.7/site-packages/pyVIA/core.py in run_VIA(self)
3413 # Query dataset, k - number of closest elements (returns 2 numpy arrays)
3414
-> 3415 self.run_subPARC()
3416 run_time = time.time() - time_start_total
3417 print('time elapsed {:.1f} seconds'.format(run_time))
~/viaenv/lib/python3.7/site-packages/pyVIA/core.py in run_subPARC(self)
2951 print(colored('project onto single cell', 'red'))
2952 #self.project_branch_probability_sc(bp_array, df_graph['pt'].values) #testing to see what the effect of not doing MCMC is
-> 2953 self.project_branch_probability_sc(bp_array, df_graph['markov_pt'].values)
2954 self.dict_terminal_super_sub_pairs = dict_terminal_super_sub_pairs
2955 hitting_times = self.markov_hitting_times
~/viaenv/lib/python3.7/site-packages/pyVIA/core.py in project_branch_probability_sc(self, bp_array_clus, pt)
1640 weight_array = csr_matrix((np.array(weight_list), (np.array(row_list), np.array(col_list))),
1641 shape=(n_cells, n_clus))
-> 1642 bp_array_sc = weight_array.dot(bp_array_clus)
1643 bp_array_sc = bp_array_sc * 1. / np.max(bp_array_sc, axis=0) # divide cell by max value in that column
1644 # print('column max:',np.max(bp_array_sc, axis=0))
~/viaenv/lib/python3.7/site-packages/scipy/sparse/base.py in dot(self, other)
357
358 """
--> 359 return self * other
360
361 def power(self, n, dtype=None):
~/viaenv/lib/python3.7/site-packages/scipy/sparse/base.py in __mul__(self, other)
514
515 if other.shape[0] != self.shape[1]:
--> 516 raise ValueError('dimension mismatch')
517
518 result = self._mul_multivector(np.asarray(other))
I also attached my input code here;
def runvia_simulated_data(name):
data_1 = pd.read_csv('/home/tialan/Data/cell_trajectory/code_new/' + str(name) + '.csv')
data_1 = data_1.T
data_1.columns = data_1.iloc[0]
data_1 = data_1[1:]
data_1 = pd.read_csv(StringIO(data_1.to_csv(index=False)))
#true_label_1 = data_1.iloc[:,0]
true_label_1 = np.concatenate((np.repeat(a = 0, repeats = 500), np.repeat(a = 1, repeats = 500)),axis=None)
true_label_1 = pd.Series(true_label_1)
adata = sc.AnnData(data_1)
sc.tl.pca(adata, svd_solver='arpack', n_comps=10)
#embedding = umap.UMAP(random_state=42, n_neighbors=15, init='random').fit_transform(data_1.values)
root = [0]
#adata.obsm['X_pca'] = data_1.values
v_1 = via.VIA(adata.obsm['X_pca'][:, 0:5], true_label_1, jac_std_global=1.0, dist_std_local=1, knn=5,
too_big_factor=0.3, root_user=root, random_seed=1,
is_coarse=True, preserve_disconnected=True, name=name)
v_1.run_VIA()
#
super_clus_ds_PCA_loc = via.sc_loc_ofsuperCluster_PCAspace(v_1, v_1, np.arange(0, len(v_1.labels))) #location in PCA space of terminal state
via.draw_trajectory_gams(adata.obsm['X_pca'][:, 0:5], super_clus_ds_PCA_loc, v_1.labels, v_1.labels, v_1.edgelist_maxout,
v_1.x_lazy, v_1.alpha_teleport, v_1.single_cell_pt_markov, true_label_1, knn=v_1.knn,
final_super_terminal=v_1.revised_super_terminal_clusters,
sub_terminal_clusters=v_1.terminal_clusters,
title_str='pseudotime', ncomp=5, arrow_width_scale_factor=30, name=name)
return
runvia_simulated_data('bc_9')
you are welcome to share the synthetic data in a link as I cannot quite tell why this is happening and think it might have to do with how the underlying data is structured which is causing some outlier effects. your knn is quite low, have you tried increasing to knn = 10 or knn 15. also try using higher number of PCs. im trying to see if there is some strange behvaiour occuring at certain params
what does the umap look like?
Thank you for the advice. It worked after I increased the pc numbers to 10 and the knn to 15.
Hi,
The run_VIA() function returns the IndexError: index 125 is out of bounds for axis 0 with size 2. I do twice run_VIA() according to the instruction. The first time is fine but the second time such error is reported. The code is run for the second time is;
tsi_list = via.get_loc_terminal_states(v_1, adata.obsm['X_pca'][:, 0:2]) v_2 = via.VIA(adata.obsm['X_pca'][:, 0:2], true_label_1, jac_std_global=0.15, dist_std_local=1, knn=5, too_big_factor=0.05, super_cluster_labels=v_1.labels, super_node_degree_list=v_1.node_degree_list, super_terminal_cells=tsi_list, root_user=root, is_coarse=False, full_neighbor_array=v_1.full_neighbor_array, full_distance_array=v_1.full_distance_array, ig_full_graph=v_1.ig_full_graph, csr_array_locally_pruned=v_1.csr_array_locally_pruned, x_lazy=0.95, alpha_teleport=0.99, preserve_disconnected=True, super_terminal_clusters=v_1.terminal_clusters, random_seed=1, name='v_2')
v_2.run_VIA()
The output including the error is; input data has shape 1000 (samples) x 2 (features) time is Sun Nov 7 14:06:54 2021 commencing global pruning Share of edges kept after Global Pruning 48.47 % commencing community detection time is Sun Nov 7 14:06:54 2021 397 clusters before handling small/big There are 0 clusters that are too big humanCD34 : global cluster graph pruning level 0.15 number of components before pruning 2 number of connected componnents after reconnecting 2 percentage links trimmed from local pruning relative to start 0.0 percentage links trimmed from global pruning relative to start 48.7 there are 2 components in the graph root user [0] start computing lazy-teleporting Expected Hitting Times ended all multiprocesses, will retrieve and reshape super_terminal_clusters [129, 130, 3, 5, 6, 134, 8, 10, 11, 136, 13, 14, 139, 16, 17, 18, 20, 21, 24, 28, 31, 32, 161, 35, 38, 40, 42, 43, 45, 54, 56, 58, 60, 62, 64, 68, 69, 71, 72, 73, 76, 78, 84, 88, 90, 93, 99, 102, 104, 106, 109, 110, 115, 117, 118, 119, 120, 126] terminal clus in this component [128, 129, 3, 5, 6, 133, 8, 10, 11, 135, 13, 14, 138, 16, 17, 18, 20, 21, 24, 28, 31, 32, 160, 35, 38, 40, 42, 43, 45, 54, 56, 58, 59, 61, 63, 67, 68, 70, 71, 72, 75, 77, 83, 87, 89, 92, 98, 101, 103, 105, 108, 109, 114, 116, 117, 118, 119, 125] final terminal clus [129, 130, 3, 5, 6, 134, 8, 10, 11, 136, 13, 14, 139, 16, 17, 18, 20, 21, 24, 28, 31, 32, 161, 35, 38, 40, 42, 43, 45, 54, 56, 58, 60, 62, 64, 68, 69, 71, 72, 73, 76, 78, 84, 88, 90, 93, 99, 102, 104, 106, 109, 110, 115, 117, 118, 119, 120, 126] From root 26 to Terminal state 128 is found 207 times. From root 26 to Terminal state 129 is found 5 times. From root 26 to Terminal state 3 is found 5 times. From root 26 to Terminal state 5 is found 5 times. From root 26 to Terminal state 6 is found 5 times. From root 26 to Terminal state 133 is found 5 times. From root 26 to Terminal state 8 is found 5 times. From root 26 to Terminal state 10 is found 5 times. From root 26 to Terminal state 11 is found 5 times. From root 26 to Terminal state 135 is found 5 times. From root 26 to Terminal state 13 is found 5 times. From root 26 to Terminal state 14 is found 5 times. From root 26 to Terminal state 138 is found 5 times. From root 26 to Terminal state 16 is found 5 times. From root 26 to Terminal state 17 is found 5 times. From root 26 to Terminal state 18 is found 5 times. From root 26 to Terminal state 20 is found 5 times. From root 26 to Terminal state 21 is found 5 times. From root 26 to Terminal state 24 is found 5 times. From root 26 to Terminal state 28 is found 5 times. From root 26 to Terminal state 31 is found 5 times. From root 26 to Terminal state 32 is found 19 times. From root 26 to Terminal state 160 is found 5 times. From root 26 to Terminal state 35 is found 5 times. From root 26 to Terminal state 38 is found 5 times. From root 26 to Terminal state 40 is found 5 times. From root 26 to Terminal state 42 is found 5 times. From root 26 to Terminal state 43 is found 5 times. From root 26 to Terminal state 45 is found 5 times. From root 26 to Terminal state 54 is found 5 times. From root 26 to Terminal state 56 is found 5 times. From root 26 to Terminal state 58 is found 5 times. From root 26 to Terminal state 59 is found 5 times. From root 26 to Terminal state 61 is found 5 times. From root 26 to Terminal state 63 is found 5 times. From root 26 to Terminal state 67 is found 5 times. From root 26 to Terminal state 68 is found 5 times. From root 26 to Terminal state 70 is found 5 times. From root 26 to Terminal state 71 is found 5 times. From root 26 to Terminal state 72 is found 5 times. From root 26 to Terminal state 75 is found 5 times. From root 26 to Terminal state 77 is found 5 times. From root 26 to Terminal state 83 is found 5 times. From root 26 to Terminal state 87 is found 5 times. From root 26 to Terminal state 89 is found 205 times. From root 26 to Terminal state 92 is found 5 times. From root 26 to Terminal state 98 is found 5 times. From root 26 to Terminal state 101 is found 5 times. From root 26 to Terminal state 103 is found 5 times. From root 26 to Terminal state 105 is found 5 times. From root 26 to Terminal state 108 is found 77 times. From root 26 to Terminal state 109 is found 5 times. From root 26 to Terminal state 114 is found 9 times. From root 26 to Terminal state 116 is found 5 times. From root 26 to Terminal state 117 is found 5 times. From root 26 to Terminal state 118 is found 5 times. From root 26 to Terminal state 119 is found 5 times. From root 26 to Terminal state 125 is found 5 times. start computing lazy-teleporting Expected Hitting Times ended all multiprocesses, will retrieve and reshape super_terminal_clusters [129, 130, 3, 5, 6, 134, 8, 10, 11, 136, 13, 14, 139, 16, 17, 18, 20, 21, 24, 28, 31, 32, 161, 35, 38, 40, 42, 43, 45, 54, 56, 58, 60, 62, 64, 68, 69, 71, 72, 73, 76, 78, 84, 88, 90, 93, 99, 102, 104, 106, 109, 110, 115, 117, 118, 119, 120, 126] no sub cluster has majority made of super-cluster 129
IndexError Traceback (most recent call last) /tmp/ipykernel_60176/2984432610.py in
8 super_terminal_clusters=v_1.terminal_clusters, random_seed=1, name='v_2')
9
---> 10 v_2.run_VIA()
~/viaenv/lib/python3.7/site-packages/pyVIA/core.py in run_VIA(self) 3389 # Query dataset, k - number of closest elements (returns 2 numpy arrays) 3390 -> 3391 self.run_subPARC() 3392 run_time = time.time() - time_start_total 3393 print('time elapsed {:.1f} seconds'.format(run_time))
~/viaenv/lib/python3.7/site-packages/pyVIA/core.py in run_subPARC(self) 2854 sub_terminal_clustemp.append(most_likely_sub_terminal) 2855 -> 2856 if (markov_hitting_times_ai[loc_j_in_sub_ai] > np.percentile( 2857 np.asarray(markov_hitting_times_ai), self.pseudotime_threshold_TS)): # 30 2858
IndexError: index 125 is out of bounds for axis 0 with size 2
Thanks in advance Tian