Closed faniafeby closed 4 years ago
There might be a problem with trying to np.concatenate
sparse matrices? Could you try scipy.sparse.vstack
instead?
I have tried the scipy.sparse.vstack
but it showed an error message of ValueError: blocks must be 2-D
. Instead, I tried using np.vstack
, but the resulted X_coexpr matrix at the end is
<4x139603695 sparse matrix of type '<class 'numpy.float64'>' with 152874082 stored elements in Compressed Sparse Row format>
and when I generated the plot in Scanpy the resulted plot becomes funny. My nest question is, what is the expected result/matrix of the concatenated and the end matrix before entered into theAnnData? Thank you!
The desired output is a # matrices by coexpression dimension matrix. I'll close the issue since it doesn't seem to be a bug with Trajectorama, but happy to answer more questions.
Also, perhaps you might try some other analyses before Trajectorama? If you are looking for integration methods, there are a few good tools out there like Scanorama. For trajectory methods, there's PAGA a number of others. Scanpy (http://scanpy.readthedocs.io/) has good tutorials on basic single-cell analysis.
The data I have is coming with 3 different time points with the total number of samples = ten. For my analysis, my supervisor recommended me to use Trajectorama without prior batch correction (Scanorama, ComBat, etc.) within each time point to preserve biological variance I think and aimed to make the trajectory out of the 3 different time points that come from 3 different studies. As my input, I used the adata.X matrice with 3000 cells (subsampled) and used the 'sample' annotation for my 'studies' input. My output (Xs_coexpr) is a list that consists of 4 arrays with shape 16.709 x 16.709 (same number as n_vars of my AnnData object). Is this as expected from Trajectorama? Thanks a lot!
I see, makes sense. That output is expected, but I'd highly recommend two things: (1) give the algorithm all the cells and (2) restrict the analysis to the top ~1-2k highly variable genes. Also, there's currently a min_cluster_samples
parameter that filters out clusters below 500 cells, which is probably removing all the clusters in your data. So maybe set that parameter to something lower like 50 or 100. The Xs_coexpr
should have the number of rows equal to the number of coexpression matrices and the number columns equal to the number of genes squared.
Also, if you want a Trajectory with a single-cell output, then there are other standard trajectory tools to try like PAGA, etc. But these might result in discontinuity across timepoints.
Hello, I am a student and currently starting a scRNAseq project that includes Trajectorama for an analysis. I used a subsampled data of 3000 cells using sc.pp.subsample from a 3-time points dataset as my input for X (adata_red.X) and used adata_red.obs['sample'] as my 'studies' variables. For the analysis, I used the provided basic API, just made a slight change in this part:
However, I may have a problem with the shape of X_coexpr after concatenating of csr_matrix, which creates an object as stated below:
and thus when I run into the Scanpy it has the error because n_comps = 0 and can't create the KNN-graph. Can you please help me to tackle this error? Thank you!