ShobiStassen / VIA

trajectory inference
https://pyvia.readthedocs.io/en/latest/
MIT License
76 stars 20 forks source link

pseudotime reproducibility #27

Closed wangjiawen2013 closed 1 year ago

wangjiawen2013 commented 1 year ago

Hi, pyVIA version 0.1.58 introduce some new bugs. Now the pseudotime still cannot be reproduced (knn=100, cluster_graph_pruning=0.5, random_seed=42). Does random_seed function internally ? Besides, I cannot plot figures in my ipython. The background became black and x and y axis disappeared. it showed that: In [1]: import pyVIA.core as via /home/wangjw/programs/miniconda3/envs/py37/lib/python3.7/site-packages/phate/init.py

In [2]: plt.plot([1,2,3,],[1,2,3]) Out[2]: [<matplotlib.lines.Line2D at 0x7ff1f507f890>]

image

ShobiStassen commented 1 year ago

Hi. I'm not able to reproduce your error and my v0.single_cell_markov_pt only changes if I change the random seed I've asked a colleague to install on their machine and check too

On Tue, 22 Nov 2022, 12:30 jiawen wang, @.***> wrote:

Hi, pyVIA version 0.1.58 introduce some new bugs. Now the pseudotime still cannot be reproduced. Does random_seed function internally ? Besides, I cannot plot figures in my ipython. The background became black and x and y axis disappeared. it showed that: In [1]: import pyVIA.core as via

/home/wangjw/programs/miniconda3/envs/py37/lib/python3.7/site-packages/phate/ init.py

In [2]: plt.plot([1,2,3,],[1,2,3]) Out[2]: [<matplotlib.lines.Line2D at 0x7ff1f507f890>]

[image: image] https://user-images.githubusercontent.com/29703450/203221725-5397a5ad-7eb2-40ba-9aff-34c842427c35.png

— Reply to this email directly, view it on GitHub https://github.com/ShobiStassen/VIA/issues/27, or unsubscribe https://github.com/notifications/unsubscribe-auth/AISI4SHRPAYFEPENADARQHTWJRD7XANCNFSM6AAAAAASHLI53I . You are receiving this because you are subscribed to this thread.Message ID: @.***>

ShobiStassen commented 1 year ago

Maybe you can try globally set your rc params. Might be that frameon is false, or your background plot facecolor is black etc

On Tue, 22 Nov 2022, 12:30 jiawen wang, @.***> wrote:

Hi, pyVIA version 0.1.58 introduce some new bugs. Now the pseudotime still cannot be reproduced. Does random_seed function internally ? Besides, I cannot plot figures in my ipython. The background became black and x and y axis disappeared. it showed that: In [1]: import pyVIA.core as via

/home/wangjw/programs/miniconda3/envs/py37/lib/python3.7/site-packages/phate/ init.py

In [2]: plt.plot([1,2,3,],[1,2,3]) Out[2]: [<matplotlib.lines.Line2D at 0x7ff1f507f890>]

[image: image] https://user-images.githubusercontent.com/29703450/203221725-5397a5ad-7eb2-40ba-9aff-34c842427c35.png

— Reply to this email directly, view it on GitHub https://github.com/ShobiStassen/VIA/issues/27, or unsubscribe https://github.com/notifications/unsubscribe-auth/AISI4SHRPAYFEPENADARQHTWJRD7XANCNFSM6AAAAAASHLI53I . You are receiving this because you are subscribed to this thread.Message ID: @.***>

wangjiawen2013 commented 1 year ago

The previously version worked well. This is introduced by v0.1.58. The following /home/wangjw/programs/miniconda3/envs/py37/lib/python3.7/site-packages/phate/init.py didn't appear when I import via in previous versions. It only appears when import v0.1.58 Perhaps phate/init.py was changed ?

wangjiawen2013 commented 1 year ago

The first run of v0.single_cell_pt_markov: array([0.679, 0.45 , 0.679, ..., 0.07 , 0.07 , 0.227]) The second run : array([0.492, 0.537, 0.492, ..., 0.065, 0.065, 0.35 ])

ShobiStassen commented 1 year ago

Are you sure you are using v0.1.57? Another colleague reinstalled it and the pt behaviour is good. Only changing if random seed is changed .... Hmmm

On Tue, 22 Nov 2022, 15:27 jiawen wang, @.***> wrote:

The first run of v0.single_cell_pt_markov: array([0.679, 0.45 , 0.679, ..., 0.07 , 0.07 , 0.227]) The second run : array([0.492, 0.537, 0.492, ..., 0.065, 0.065, 0.35 ])

— Reply to this email directly, view it on GitHub https://github.com/ShobiStassen/VIA/issues/27#issuecomment-1323224753, or unsubscribe https://github.com/notifications/unsubscribe-auth/AISI4SEWDEDMVECYLT4FYB3WJRYWDANCNFSM6AAAAAASHLI53I . You are receiving this because you commented.Message ID: @.***>

wangjiawen2013 commented 1 year ago

I am using pyVIA 0.1.58, I'll try 0.1.57 later.

ShobiStassen commented 1 year ago

Ah sorry no no, 0.1.58 is the correct version with the pt controlled by random seed. I'm not sure why it's not working for you

On Tue, 22 Nov 2022, 16:28 jiawen wang, @.***> wrote:

I am using pyVIA 0.1.58, I'll try 0.1.57 later.

— Reply to this email directly, view it on GitHub https://github.com/ShobiStassen/VIA/issues/27#issuecomment-1323287330, or unsubscribe https://github.com/notifications/unsubscribe-auth/AISI4SAX44NXQHKVEX4FWD3WJR727ANCNFSM6AAAAAASHLI53I . You are receiving this because you commented.Message ID: @.***>

wangjiawen2013 commented 1 year ago

May be you can send your chunk of code and example data and package version called by pyVIA that used the random_seed, then I can check my code.

ShobiStassen commented 1 year ago

hi again @wangjiawen2013 Hi, when running the code in examples.py for the Toy3 dataset i try 2 different random seeds and am printing the first 10 values of single_cell_markov_pt. I am now on pypip version 0.1.59 which is behaving same as 0.1.58 for me.

Code output for RS=2: Toy3 dataset 2022-11-22 19:25:32.294087 Start making edgebundle milestone... 2022-11-22 19:25:32.298072 Time elapsed 3.6 seconds 2022-11-22 19:25:32.316403 For random seed: 2, the first sc markov pt are [0.464 0.821 0.614 0.527 0.66 0.821 0.648 0.66 0.83 0.614]

Code output for RS=2: Toy3 dataset 2022-11-22 19:25:32.294087 Start making edgebundle milestone... 2022-11-22 19:25:32.298072 Time elapsed 3.6 seconds 2022-11-22 19:25:32.316403 For random seed: 2, the first sc markov pt are [0.464 0.821 0.614 0.527 0.66 0.821 0.648 0.66 0.83 0.614]

Code output for RS=3: Toy3 dataset 2022-11-22 19:26:27.144603 Start making edgebundle milestone... 2022-11-22 19:26:27.148732 Time elapsed 3.6 seconds 2022-11-22 19:26:27.166670 For random seed: 3, the first sc markov pt are [0.473 0.76 0.627 0.534 0.663 0.76 0.652 0.663 0.881 0.627]

I am uploading the revised code onto the github folder pyvia and you can try them directly if you like

wangjiawen2013 commented 1 year ago

Yes, the results of the toy dataset can be reproduced. While in my case, it's still cannot be reproduced. Very strange....... here are two warnings during the process, does them matters ? /home/wangjw/programs/miniconda3/envs/cellpy/lib/python3.10/site-packages/pyVIA/core.py:2303: RuntimeWarning: invalid value encountered in divide bp_array = bp_array / bp_array.sum(axis=1)[:, None]

/home/wangjw/programs/miniconda3/envs/cellpy/lib/python3.10/site-packages/scipy/sparse/_index.py:103: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient. self._set_intXint(row, col, x.flat[0])

I downloaded another datasets (adata = sc.datasets.paul15() from https://scanpy-tutorials.readthedocs.io/en/latest/paga-paul15.html), there are no warnings during the run and the results were reproducible ! Perhaps this is a dataset-specific case (my case).

wangjiawen2013 commented 1 year ago

I tried other datesets (my own projects), the results are still different each run. the cell numbers are 30,000 and 50,000 respectively. Then I subset the cells to 4000, and this time the results kept the same each run !!!!!!!!!!!!!!!! So I guess VIA will become unstable when there are large number of cells ?

ShobiStassen commented 1 year ago

hi @wangjiawen2013

I just tried on a real scRNAseq dataset with 89267 cells and get consistent pt and branching probabilities for same RS. these values change only if i change the RS... So i dont think the large cell count impacts the values

ShobiStassen commented 1 year ago

Are you randomly subsetting your 30K, or 40K values so that the cells are different on each run?

wangjiawen2013 commented 1 year ago

Hi, I am still testing. I find that the difference of each run started from step 7: community detection using leiden algorithm. When using large datasets, the clusters number found were different each time, while it kept consistently when using small datasets. My leiden algorithm cannot access the random seed ?

2022-11-23 10:09:45.325907 Running VIA over input data of 29164 (samples) x 30 (features) 2022-11-23 10:09:45.326003 Knngraph has 30 neighbors 2022-11-23 10:09:48.968399 Finished global pruning of 30-knn graph used for clustering. Kept 48.7 % of edges. 2022-11-23 10:09:49.114890 Number of connected components used for clustergraph is 1 2022-11-23 10:09:50.430855 The number of components in the original full graph is 1 2022-11-23 10:09:50.430944 For downstream visualization purposes we are also constructing a low knn-graph 2022-11-23 10:09:51.179152 Commencing community detection 2022-11-23 10:09:51.774839 Finished running Leiden algorithm. Found 1481 clusters. 2022-11-23 10:09:51.781722 Merging 1431 very small clusters (<10) 2022-11-23 10:09:51.804725 Finished detecting communities. Found 50 communities 2022-11-23 10:09:51.806025 Making cluster graph. Global cluster graph pruning level: 0.15

ShobiStassen commented 1 year ago

Hmm... I was pretty sure the clusters are consistent and don't change with RS In your above output , it just means 1431 of the 1481 clusters are potentially fragments and will be tested for merging. Do those numbers for the highlighted steps change for you on each run? i will have a look later too

wangjiawen2013 commented 1 year ago

Yes, they always changes, they were 1481 and 1431, and changed to 1470 and 1449.

ShobiStassen commented 1 year ago

hmmm that is inded strange, the rand_seed does get passed to the leiden clustering (see my output for the 89,000 cells on 2 runs with same rs) i wonder if it's because there are so many 1000s clusters. have you tried any other datasets? RUN1 2022-11-23 08:18:11.010791 Finished running Leiden algorithm. Found 59 clusters. 2022-11-23 08:18:11.065739 Merging 3 very small clusters (<10) 2022-11-23 08:18:11.083596 Finished detecting communities. Found 56 communities ... 2022-11-23 08:18:22.808451 Terminal clusters corresponding to unique lineages in this component are [9, 10, 12, 13, 14, 18, 24, 25, 28, 29, 32, 39, 41, 43, 44, 47, 48, 49]

RUN2 2022-11-23 11:40:24.363910 Finished running Leiden algorithm. Found 59 clusters. 2022-11-23 11:40:24.397627 Merging 3 very small clusters (<10) 2022-11-23 11:40:24.414253 Finished detecting communities. Found 56 communities ... 2022-11-23 11:40:31.927167 Terminal clusters corresponding to unique lineages in this component are [9, 10, 12, 13, 14, 18, 24, 25, 28, 29, 32, 39, 41, 43, 44, 47, 48, 49]

ShobiStassen commented 1 year ago

can you see if the behaviour persists even if jac_std_global is increased to 1 or 0.5

wangjiawen2013 commented 1 year ago

Higher jac_std_global greatly reduced the number of cluster, but the results were still different each run. Could I send you my data in excel, so you can try it on your machine ?

ShobiStassen commented 1 year ago

Hi there, i might have time next week, but this week i'm afraid not

wangjiawen2013 commented 1 year ago

I think I have found the soultion. The problem was cause by the parameter "num_threads". When I deleted num_threads or set num_threads=1, the results can be reproduced exactly each time, and VIA ran even faster than num_threads=30.

Here are some information about the issue, but thery are described in Chinese character. https://blog.csdn.net/weixin_43977640/article/details/114969005

In summary, random_seed conflicts with num_threads.

Yeah Yeah Yeah hahaha

ShobiStassen commented 1 year ago

glad to hear it works now!

wangjiawen2013 commented 1 year ago

The arrows of pseudotime trajectory are covered by scatter points, although we can set alpha transparency. I think it's better to put the arrow on top of the points, so we can see the arrows clearly.

ShobiStassen commented 1 year ago

@wangjiawen2013 hi which function is this in?

wangjiawen2013 commented 1 year ago

via.draw_trajectory_gams()

ShobiStassen commented 1 year ago

trajectorygams

hi, @wangjiawen2013 this one? my arrows are on top

ShobiStassen commented 1 year ago

edgebundle_labels This is an alternative visualization option too which might complement the above

wangjiawen2013 commented 1 year ago

It's awesome ! What does the thickness/color and arrows of the line mean ? How does VIA generate the smooth lines ?