HaojiaWu / CellScopes.jl

A Julia package for single cell and spatial data analysis
MIT License
26 stars 2 forks source link

cs.scRNAObject(raw_counts) not finished in hours #3

Closed MaximilianNuber closed 9 months ago

MaximilianNuber commented 10 months ago

Hello everyone,

Thank you for the great package, I am looking forward to working with CellScopes and Julia more (MLE Mixed models being absent in Python to my knowledge and my data being too large for R :P).

I have an issue constructing the scRNAObject. The function does not finish after hours, even when running over night. previously I constructed the Raw count object from a sparse matrix, gene_names and cell_names.

obj = cs.RawCountObject(transpose(counts), obs.UMIs, var.Column1)
pbmc = cs.scRNAObject(obj)

I am working with a dataset of 380k cells that I preprocessed and annotated in Python. I saved the counts matrix (thats why I had to transpose, coming from scanpy) and gene and cell annotations. When loading/converting from .h5ad I had the same issue of the function never ending.

adata = cs.from_scanpy("scVI-integrated.h5ad"; data_type="scRNA", anno="cell_type")

Looking at the 400k cell tutorial, it seemed like creating the object should be quite fast.

Did I forget about something?

(Julia 1.8.1, only installed Gadfly, removed Gadfly, install MatrixMarket, CSV, DataFrames and SingleCellProjection, in addition to CellScopes. Am quite new to Julia).

Thank you for any help :)

HaojiaWu commented 10 months ago

Hi @MaximilianNuber Thanks for your interest in using CellScopes. I tried the MCA 400K dataset again today, and the cs.scRNAObject took about 5 seconds. It shouldn't be that long. Here are some thoughts:

  1. What is in your obs.UMIs? Here is my command line to construct the RawCountObject:
    @time rawcount = cs.RawCountObject(counts, cells, genes);

    obs.UMIs needs to be a vector of cell ids.

  2. After you do this counts = transpose(counts), can you check the data type.
    typeof(counts)

    it should be SparseMatrixCSC{Int64, Int64}. If not, please convert it to that type. Let me know if those steps don't help.

MaximilianNuber commented 9 months ago

Dear @HaojiaWu,

Please forgive my late reply. Yes, it was about the type of the CSC matrix, which was of LinearAlgebra.Transpose(CSC matrix..., because I needed to transpose the matrix after saving it from scanpy. Converting to CSC matrix or just copy() worked both. Thank you and apologies again for my late reply. Best, Max