Closed gboggy2 closed 9 months ago
I would add that separating the training and prediction steps is also desirable. You could separate this into two functions:
1) TrainScTour()
which kicks off most of the work, and outputs a model, vector of features used for training, and the pseudotime plotting (which for a basic answer to "Is this model interesting?" could stay in python after you've added a "metadata field(s) of interest" parameter as Ben suggested.)
2) PredictScTourPseudotime
which reads in a Seurat object, a previously-saved model file, and then does the prediction step in python, writes the ptime to an intermediate file and then applies it to the Seurat object's metadata.
the primary reasons to separate these two is that 1) pseudotime training is stochastic, so saving your model for reuse is necessary and 2) you don't know whether or not your variable of interest will be captured by the pseudotime.
hey guys, these functions still need some TLC with respect to Ben's requested changes. I'm pushing now to start working through getting the python environment worked out on the actions runner.
something that I'm not wholly satisfied with is the reliance on needing both the model file and the list of variable genes for prediction. In the short term I imagine we can just keep these tied together in prime-seq workbooks, but I am curious if there's a weird python data type (pickle file?) that we can leverage to tie these together.
something conga related broke on that merge, but I'm not sure what or why.
Closed in favor of #207
This PR is for a function that enables running scTour from Seurat and updating a Seurat object with pseudotime.