LouisFaure / scFates

a scalable python suite for tree inference and advanced pseudotime analysis from scRNAseq data.
https://scfates.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
47 stars 1 forks source link

error in test_association #7

Closed jksr closed 1 year ago

jksr commented 1 year ago

hi scFates developer,

Thank you for creating this useful tool. I used it to compute the principle tree of my data and it worked wonderful.

But when I tried to test the gene association with the tree via scf.tl.test_association, I got the following error

RRuntimeError: Error in smooth.construct.tp.smooth.spec(object, dk$data, dk$knots) : 
  A term has fewer unique covariate combinations than specified maximum degrees of freedom

I wonder what do you think could be the problem? and which parameter should I change or how should I filter the data to fix this?

thx

LouisFaure commented 1 year ago

Hi, thank you for your interest in scFates! This kind of error usually happens when there are less cells than degrees of freedom (default is 5) in a given branch. I would go back to the learned tree and check and remove any spurious branches if there are any. scf.tl.cleanup can help in such case, otherwise changing the initial tree parameters.

Hope that helps

jksr commented 1 year ago

Thank you for the quick reply and the suggestion.

I don't it's the low cell number in my case, but I found the pseudotimes of the branches are too discrete in my case. In one of the branch, all cells have the same pseudotime, which might be the reason.

image

I tried tl.cleanup following your suggestion, but some info in the adata seems broken after running it. pl.graph didn't show the principle tree, and all following analysis didn't work. tl.cleanup did give a truedivide warning, which might related to the tl.pseudotime error. but not sure if related to the pl.graph problem since I checked .uns['graph'] which were all fine.

image
LouisFaure commented 1 year ago

I understand now the issue and is linked to the fact of using elpigraph and my last breaking change concerning how pseudotime is calculated:

Before, cells were randomly assigned a probability of position between its node and the closest one. I was not happy with such approach because pseudotime value would vary between runs (of course solvable with a seed, but this wont reflect the true position of the cell on the tree)

I have now changed the approach to a fully deterministic one, meaning that the actual values from the assignment matrix R are used for assigning cells to nodes.

I realize now this becomes an issue when using elpigraph, which has a specific form of hard assignment R matrix, where value can only be 1 for the closest node and 0 for the rest, hence all cells being assigned the pseudotime value of their closest node. Trees generated with simpleppt don't have this issue since cells are assigned a value between 0 and 1 (soft assignment R matrix) towards all nodes of the tree.

I will think of another way of assigning pseudotime in elpigraph situation, most likely by projecting cells to their closest edge, in the mean time I would suggest you to either rerun the tree learning using simpleppt, or by converting your existing hard assignment R matrix into a soft assignment one using scf.tl.convert_to_soft (be aware of the sigma parameter, if the tree collapse after conversion, I would reduce it, I will also implement a more automatic way of avoiding such possible case)

jksr commented 1 year ago

I will think of another way of assigning pseudotime in elpigraph situation, most likely by projecting cells to their closest edge,

I guess maybe combining voronoi diagram and edge projection will give a reasonable results for some complex situation.

in the mean time I would suggest you to either rerun the tree learning using simpleppt, or by converting your existing hard assignment R matrix into a soft assignment one using scf.tl.convert_to_soft (be aware of the sigma parameter, if the tree collapse after conversion, I would reduce it, I will also implement a more automatic way of avoiding such possible case)

thank you for the suggestions. I have to stick with the epg as it generates more reasonable initial principle tree in my case

epg vs ppt image image

As you warned, it needs some param try before scf.tl.convert_to_soft works, but it's much easier to test this than ppt parameters. After scf.tl.convert_to_soft, pseudotime now can be calculated and the results look good. But there are still some other problems which I could not get through yet. I'm not sure if this is still relevant in this thread, I could open a new one if you think it's necesary.

a plotting issue

image

some error possibly related to wrong/missing adata attr

image

some error possibly related to wrong/missing adata attr

image

thx!

LouisFaure commented 1 year ago

I have now released version 0.9.0, which solves the pseudotime issue for elpigraph! I have also included a explore_sigma function which should help choosing the right one. I have made a tutorial about its usage here. In your case it is possible that the sigma is too high, leading to a collpase of the points, but it is true that elpigraph can more reliably construct trees without too much tweaking, that is why we have included it in scFates!

Concerning the plotting issues, except for the last one which I can fix quickly, it is hard to figure out what is wrong, maybe the best would that you send me an anonymized save of your anndata object so I can have a closer look into it!

LouisFaure commented 1 year ago

I am closing this issue by letting you know that I have released version v0.9.1, where I changed the pseudotime calculation method to the one used by elpigraph itself, as it leads to better pseudotime ordering of cells (somehow my approach still was generating gaps and was collapsing cells towards principal points).

For the plotting issues feel free to open a new issue!