alexdobin / STAR

RNA-seq aligner
MIT License
1.87k stars 506 forks source link

Error reading Velocity matrix #774

Open romanhaa opened 5 years ago

romanhaa commented 5 years ago

Hi Alex,

Following your question on Twitter I started a comparison for the spliced/unspliced counts from velocyto, STARsolo and kallisto. Unfortunately, when trying to read in the "Velocyto" matrix generated by STARsolo, I'm getting this error:

Matrix::readMM(".../STARsolo/Velocyto/raw/matrix.mtx")
# Error: readMM(): row     values 'i' are not in 1:nr

Any idea what the issue could be? The mtx file is ~1 GB in size.

Using scanpy's read_mtx function works fine.

import scanpy as sc
sc.read_mtx(".../STARsolo/Velocyto/raw/matrix.mtx")
# AnnData object with n_obs × n_vars = 11720 × 60609

For those running into the same problem, you can load the .mtx file generated by STARsolo and then save it again using scanpy/scipy functions as shown below. Afterwards, it will be readable by Matrix::readMM() in R.

import scanpy as sc
import scipy
t = sc.read_mtx('.../STARsolo/Velocyto/raw/matrix.mtx')
scipy.io.mmwrite('.../STARsolo/Velocyto/raw/matrix_new_format.mtx', t.X)
Matrix::readMM(".../STARsolo/Velocyto/raw/matrix_new_format.mtx") %>% str()
# Formal class 'dgTMatrix' [package "Matrix"] with 6 slots
#   ..@ i       : int [1:48570060] 1 1 1 1 1 1 1 1 1 1 ...
#   ..@ j       : int [1:48570060] 6962 54995 69331 89585 121706 145759 228188 337667 383554 383896 ...
#   ..@ Dim     : int [1:2] 60609 6794880
#   ..@ Dimnames:List of 2
#   .. ..$ : NULL
#   .. ..$ : NULL
#   ..@ x       : num [1:48570060] 0 0 0 0 0 0 0 0 0 0 ...
#   ..@ factors : list()

Best, Roman

# R version 3.6.1 (2019-07-05)
# Platform: x86_64-pc-linux-gnu (64-bit)
# Running under: Debian GNU/Linux 9 (stretch)
# 
# Matrix products: default
# BLAS/LAPACK: /usr/lib/libopenblasp-r0.2.19.so
# 
# locale:
#  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C             
#  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
# [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
# 
# attached base packages:
# [1] stats     graphics  grDevices utils     datasets  methods   base     
# 
# other attached packages:
#  [1] Matrix_1.2-17   forcats_0.4.0   stringr_1.4.0   dplyr_0.8.3    
#  [5] purrr_0.3.3     readr_1.3.1     tidyr_1.0.0     tibble_2.1.3   
#  [9] ggplot2_3.2.1   tidyverse_1.2.1
# 
# loaded via a namespace (and not attached):
#  [1] Rcpp_1.0.2       cellranger_1.1.0 pillar_1.4.2     compiler_3.6.1  
#  [5] tools_3.6.1      zeallot_0.1.0    jsonlite_1.6     lubridate_1.7.4 
#  [9] lifecycle_0.1.0  gtable_0.3.0     nlme_3.1-141     lattice_0.20-38 
# [13] pkgconfig_2.0.3  rlang_0.4.1      cli_1.1.0        rstudioapi_0.10 
# [17] haven_2.1.1      withr_2.1.2      xml2_1.2.2       httr_1.4.1      
# [21] generics_0.0.2   vctrs_0.2.0      hms_0.5.2        grid_3.6.1      
# [25] tidyselect_0.2.5 glue_1.3.1       R6_2.4.0         readxl_1.3.1    
# [29] modelr_0.1.5     magrittr_1.5     backports_1.1.5  scales_1.0.0    
# [33] rvest_0.3.4      assertthat_0.2.1 colorspace_1.4-1 stringi_1.4.3   
# [37] lazyeval_0.2.2   munsell_0.5.0    broom_0.5.2      crayon_1.3.4
romanhaa commented 5 years ago

Now that I was able to load the data, I'm asking myself what this matrix is. Shouldn't there have been spliced, unspliced, and ambiguous? Sorry, maybe I'm completely on the wrong track.

alexdobin commented 5 years ago

Hi Roman,

thanks for looking into it... it seems you are going to be the alpha-tester for this feature! :) Sorry for the lack of documentation, and for the breaking of the matrix market format. Basically, I dumped the spliced, unspliced and ambiguous counts as three columns in this file i.e. col3=spliced, col4=unspliced, col5=ambiguous

Cheers Alex

romanhaa commented 5 years ago

Ohhhh, I see. I guess in that case I can reproduce the sparse matrix like this:

t <- readr::read_delim('matrix.mtx', delim = ' ', skip = 3, col_names = FALSE)

spliced <- Matrix::sparseMatrix(
  i = t$X1,
  j = t$X2,
  x = t$X3,
  dims = c(60609, 6794880)
)
Matrix::writeMM(spliced, 'spliced.mtx')

unspliced <- Matrix::sparseMatrix(
  i = t$X1,
  j = t$X2,
  x = t$X4,
  dims = c(60609, 6794880)
)
Matrix::writeMM(unspliced, 'unspliced.mtx')

The matrix dimensions are taken from the header of the matrix.mtx file. It's looking good so far. No idea what the scanpy method gave me then :D

romanhaa commented 5 years ago

I see high correlation between the results of velocyto (input was the Cell Ranger .bam file) and STARsolo, both for spliced and unspliced counts. Less so for kallisto. There are some interesting differences in the number of transcripts and detected genes.

Number of transcripts:

number_of_transcripts_spliced_unspliced

Number of detected genes:

number_of_expressed_genes_spliced_unspliced

Correlation of spliced counts (by genes; 0's are set to 0.1 for log-scale):

correlation_genes_number_of_transcripts_spliced

Correlation of unspliced counts (by genes; 0's are set to 0.1 for log-scale):

correlation_genes_number_of_transcripts_unspliced

alexdobin commented 5 years ago

Hi Roman,

thanks! Looks similar to what I was getting for STARsolo. I could not fully reproduce the velocyto logic, unfortunately. I was communicating with Gioele La Manno about it, but I guess we both got busy and dropped the ball. I am planning to start working on it again. If you make these results public, it will put a bit more pressure on us. :)

When I tried to run kallisto with velocyto calculation, it used 68GB of RAM. I am going to ask Lior about it on tweeter. :)

Cheers Alex

romanhaa commented 5 years ago

Yes, I'd like to know Lior's opinion as well. kallisto took a lot longer to run this time. I have no reliable numbers (will have to repeat) but kallisto took a significant hit in performance for this analysis. Generating the index alone took 2.5 hours (and it is 42GB in size). STARsolo didn't need a new index and finished the analysis in 1 hour with 20 threads (I get 9.5 CPU hours in total).

If you make these results public, it will put a bit more pressure on us.

I'll do my best but I'm not sure which form would be appropriate.

Cheers, Roman

alexdobin commented 5 years ago

Hi Roman,

If without velocyto STARsolo took 35 min, and with 1 hour - it seems reasonable. It's a bit slower than my expectation (~50% increase in time) - of course, the actual results depend strongly on the hardware. The velocyto counting is not parallelized yet, I hope to improve it in the future.

I am afraid to give advice about the form or forum. Tweeter discussions tend to become overheated... some people just cannot take others criticizing their work (not me : ) - I am happy to hear any kind of critique, as it always helps to improve the results). I think you may have enough material for a benchmarking paper of your own. It would probably be best if you contacted all Authors for optimal parameters etc. We can also have an e-mail discussion with Velocyto Authors. Gioele La Manno was the 1st Author, and Peter Kharchenko and Sten Linnarson the senior Authors.

Cheers Alex

romanhaa commented 5 years ago

Yes the run time is absolutely reasonable. Actually, for the spliced/unspliced counts STARsolo appears to be the fastest of all three methods. But further improvements would of course be appreciated :)

I appreciate your transparency and open-minded attitude towards these comparisons. Indeed, Twitter discussions are not ideal. I don't want to side with anybody and just present my perspective as a user. I'll have to think about the possibility of putting together a paper. Thanks for your suggestions and the offer to reach out to Gioele La Manno and the other authors of Velocyto. Optimizing the parameters for each tool would probably be an important thing to do before publishing performance comparisons.

Btw, what do you think about the difference in ratio (spliced/unspliced counts) between STARsolo vs Velocyto + kallisto? I found that to be quite striking.

Cheers, Roman

alexdobin commented 5 years ago

Hi Roman,

sorry for the belayed reply, I was swamped by the Genome Informatics conference.

I appreciate your transparency and open-minded attitude towards these comparisons. Indeed, Twitter discussions are not ideal. I don't want to side with anybody and just present my perspective as a user. I'll have to think about the possibility of putting together a paper. Thanks for your suggestions and the offer to reach out to Gioele La Manno and the other authors of Velocyto. Optimizing the parameters for each tool would probably be an important thing to do before publishing performance comparisons.

Absolutely! And I would welcome any feedback from the user perspective.

Btw, what do you think about the difference in ratio (spliced/unspliced counts) between STARsolo vs Velocyto + kallisto? I found that to be quite striking.

Indeed, the difference seems on the large sides. I have not yet looked at kallisto/bustools results carefully. I have made my comparisons with velocyto, and I remember that the total counts: spliced+unspliced+ambiguous were very close, but there was some exchange of counts between spliced and ambiguous. Just to clarify, by the number of transcripts you mean "number of molecules" - i.e. UMIs? And you are comparing the counts for the ~10k cells that were filtered by velocyto?

Cheers Alex

jenzopr commented 4 years ago

Just for reference/completion, here is a python way to do that:

import numpy as np
from scipy import sparse
mtx = np.loadtxt('matrix.mtx', skiprows=3, delimiter=' ')

spliced = sparse.coo_matrix((mtx[:,2], (mtx[:,0]-1, mtx[:,1]-1)), shape = (60558,737280))
unspliced = sparse.coo_matrix((mtx[:,3], (mtx[:,0]-1, mtx[:,1]-1)), shape = (60558,737280))
ambiguous = sparse.coo_matrix((mtx[:,4], (mtx[:,0]-1, mtx[:,1]-1)), shape = (60558,737280))

The shape (60558,737280) is taken from the matrix header Thanks to @msbentsen :-)

droplet-lab commented 4 years ago

Thanks @jenzopr, I am still quite new to python, but was wondering how you merged this to the adata gene count matrix for velocity analysis using scVelo?

jenzopr commented 4 years ago

@Jachimdejonghe scvelo has a nice utility function for this: scvelo.utils.merge, find the help here

droplet-lab commented 4 years ago

@jenzopr Thanks ! To be more specific, did you manage to create a anndata file with both spliced and unspliced matrices as layers (to be later merged with the regular count matrix using the scvelo.utils.merge utility)? Or do you just merge twice (once with the spliced and once with the unspliced matrix)? Sorry if this is a stupid question but I haven't found an easy way to do this elegantly. I am trying to go through making a loom file via loompy but it doesn't work well at the moment. Any help is really appreciated, if of course you have been using scVelo :) . @alexdobin it would be really awesome if you would consider outputting the velocity results in a loom file with three layers: spliced, unspliced and ambiguous (like velocyto does). Of course not urgent but I would think this is useful for both Seurat and scVelo users.

jenzopr commented 4 years ago

@Jachimdejonghe using the AnnData constructor, you can initialise the object with all three layers:

adata = anndata.AnnData(layers = {'spliced': spliced, 'unspliced': unspliced, 'ambiguous': ambiguous})

Afterwards you can merge with an existing anndata. Is this what you were looking for? Ref: https://anndata.readthedocs.io/en/stable/anndata.AnnData.html#anndata.AnnData

droplet-lab commented 4 years ago

Nice thanks a lot, yes I saw you could load those in the anndata object but still go some errors that I couldn't solve (because I am no good in python haha), so ultimately I went with the following (Taken the Gene/raw and Velocyto/raw files), with 55471,6909 taken from the matrix header:

adata = sc.read_10x_mtx('./STARSolo/Gene/raw') mtx = np.loadtxt('./STARSolo/Velocyto/raw/matrix.mtx', skiprows=3, delimiter=' ') adata.layers['spliced'] = sparse.coo_matrix((mtx[:,2], (mtx[:,0]-1, mtx[:,1]-1)), shape = (55471,6909)).tocsr().T adata.layers['unspliced'] = sparse.coo_matrix((mtx[:,3], (mtx[:,0]-1, mtx[:,1]-1)), shape = (55471,6909)).tocsr().T adata.layers['ambiguous'] = sparse.coo_matrix((mtx[:,4], (mtx[:,0]-1, mtx[:,1]-1)), shape = (55471,6909)).tocsr().T adata.write_loom("./STARSolo/STARSolo.loom",write_obsm_varm=True)

jenzopr commented 4 years ago

Certainly, this works as well. What I like about the scvelo.utils.merge is, that it will get rid of the unmatched cells in spliced and unspliced. This way, you can use the Gene/filtered matrices..

bapoorva commented 3 years ago

@Jachimdejonghe Thank you for your workaround. I managed to create a loom file successfully. However, when I load it into R to use it with Seurat, it gives me this error

reading loom file via hdf5r...
Error in `[[.H5File`(f, "col_attrs/CellID") : 
  An object with name col_attrs/CellID does not exist in this group

Any ideas on how to fix this ?

ftucos commented 3 years ago

Just for reference/completion, here is a python way to do that:

import numpy as np
from scipy import sparse
mtx = np.loadtxt('matrix.mtx', skiprows=3, delimiter=' ')

spliced = sparse.coo_matrix((mtx[:,2], (mtx[:,0]-1, mtx[:,1]-1)), shape = (60558,737280))
unspliced = sparse.coo_matrix((mtx[:,3], (mtx[:,0]-1, mtx[:,1]-1)), shape = (60558,737280))
ambiguous = sparse.coo_matrix((mtx[:,4], (mtx[:,0]-1, mtx[:,1]-1)), shape = (60558,737280))

The shape (60558,737280) is taken from the matrix header Thanks to @msbentsen :-)

I found this discussion very useful, I just wanted to add something to @jenzopr code I would recomend to use csr_matrix in place of coo_matrix since the latter does not properly supports indexing and this would cause some errorrs when triyng to subset the object or running scvelo.utils.merge

Also, In place of manually specifing the shape of the matrix, I would extract it like that: shape = np.loadtxt('matrix.mtx', skiprows=2, max_rows = 1, delimiter=' ')[0:2].astype(int)

alexdobin commented 3 years ago

Hi All,

the 2.7.8a release now outputs separate matrices for Velocyto spliced/unspliced/ambiguous counts. Not sure if this is very helpful, especially if you already have your pipelines running, but I thought that it would be a bit less confusing. If necessary, I could add an option to output the combined matrix.mtx.

Cheers Alex

droplet-lab commented 3 years ago

Hi Alex, that's great and really helpful. I think adding the combined matrix is also quite useful.

Thanks a lot

ishikorotaroJ commented 3 years ago

@bapoorva It's the label issue.

https://github.com/yyoshiaki/mishima_gassyuku/blob/master/README.md

I followed this one.

import loompy

ds = loompy.connect("out.loom") ds.row_attrs['Gene'] = ds.row_attrs['var_names'] ds.col_attrs['CellID'] = ds.col_attrs['obs_names'] ds.close()

Then it can be imported by Seurat.

JBreunig commented 3 years ago

Hi All,

the 2.7.8a release now outputs separate matrices for Velocyto spliced/unspliced/ambiguous counts. Not sure if this is very helpful, especially if you already have your pipelines running, but I thought that it would be a bit less confusing. If necessary, I could add an option to output the combined matrix.mtx.

Cheers Alex

Might anyone have updated code for inputting the split matrices into anndata?

I've got the adata object set up but I'm struggling to add the separate Velocyto spliced/unspliced/ambiguous matrices from 2.7.9a as layers.

Update: solved with the help of ftucos 👍

ftucos commented 3 years ago

Hi All, the 2.7.8a release now outputs separate matrices for Velocyto spliced/unspliced/ambiguous counts. Not sure if this is very helpful, especially if you already have your pipelines running, but I thought that it would be a bit less confusing. If necessary, I could add an option to output the combined matrix.mtx. Cheers Alex

Might anyone have updated code for inputting the split matrices into anndata?

I've got the adata object set up but I'm struggling to add the separate Velocyto spliced/unspliced/ambiguous matrices from 2.7.9a as layers.

Update: solved with the help of ftucos 👍

I'll post here the code the code we shared by email, maybe somebody could find it useful too.

Function to generate an anndata object from the STARsolo output folder

You just need to point to the STARsolo output folder adata = buildAnndataFromStar('data/sample1.Solo.out/')

# Load required libraries
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy
from scipy import sparse
import anndata
import scanpy as sc
import scvelo as scv
scv.set_figure_params()
os.chdir('/path/to/work/dir')

The following function is for older versions of STARsolo that generate only one sparse matrix with 3 value columns for spliced, unspliced and ambiguous.

def buildAnndataFromStar(path):
    """Generate an anndata object from the STAR aligner output folder"""
    path=path
    # Load Read Counts
    X = sc.read_mtx(path+'Gene/raw/matrix.mtx')

    # Transpose counts matrix to have Cells as rows and Genes as cols as expected by AnnData objects
    X = X.X.transpose()

    # This matrix is organized as a sparse matrix with Row, Cols and 3 values columns for 
    # Spliced, Unspliced and Ambigous reads
    mtx = np.loadtxt(path+'Velocyto/raw/matrix.mtx', skiprows=3, delimiter=' ')
    # Extract sparse matrix shape informations from the third row
    shape = np.loadtxt(path+'Velocyto/raw/matrix.mtx', skiprows=2, max_rows = 1 ,delimiter=' ')[0:2].astype(int)

    # Read the sparse matrix with csr_matrix((data, (row_ind, col_ind)), shape=(M, N))
    # Subract -1 to rows and cols index because csr_matrix expects a 0 based index
    # Traspose counts matrix to have Cells as rows and Genes as cols as expected by AnnData objects
    spliced = sparse.csr_matrix((mtx[:,2], (mtx[:,0]-1, mtx[:,1]-1)), shape = shape).transpose()
    unspliced = sparse.csr_matrix((mtx[:,3], (mtx[:,0]-1, mtx[:,1]-1)), shape = shape).transpose()
    ambiguous = sparse.csr_matrix((mtx[:,4], (mtx[:,0]-1, mtx[:,1]-1)), shape = shape).transpose()

    # Load Genes and Cells identifiers
    obs = pd.read_csv(path+'Velocyto/raw/barcodes.tsv',
                  header = None, index_col = 0)
    # Remove index column name to make it compliant with the anndata format
    obs.index.name = None

    var = pd.read_csv(path+'Velocyto/raw/features.tsv', sep='\t',
                  names = ('gene_ids', 'feature_types'), index_col = 1)

    # Build AnnData object to be used with ScanPy and ScVelo
    adata = anndata.AnnData(X = X, obs = obs, var = var,
                        layers = {'spliced': spliced, 'unspliced': unspliced, 'ambiguous': ambiguous})
    adata.var_names_make_unique()

    # Subset Cells based on STAR filtering
    selected_barcodes = pd.read_csv(path+'Gene/filtered/barcodes.tsv', header = None)
    adata = adata[selected_barcodes[0]]

    return adata.copy()

After importing your andata object, you can eventually export it for a quicker load later.

adata.write('processed/sample_1.h5ad', compression = 'gzip')
adata = anndata.read_h5ad('processed/sample_1.h5ad')

Here you have the updated function from @JBreunig to load the 3 individual matrices generated in later versions of STARsolo

def buildAnndataFromStarCurr(path):
    """Generate an anndata object from the STAR aligner output folder"""
    path=path
    # Load Read Counts
    X = sc.read_mtx(path+'Gene/raw/matrix.mtx')

    # Transpose counts matrix to have Cells as rows and Genes as cols as expected by AnnData objects
    X = X.X.transpose()

    # Load the 3 matrices containing Spliced, Unspliced and Ambigous reads
    mtxU = np.loadtxt(path+'Velocyto/raw/unspliced.mtx', skiprows=3, delimiter=' ')
    mtxS = np.loadtxt(path+'Velocyto/raw/spliced.mtx', skiprows=3, delimiter=' ')
    mtxA = np.loadtxt(path+'Velocyto/raw/ambiguous.mtx', skiprows=3, delimiter=' ')

    # Extract sparse matrix shape informations from the third row
    shapeU = np.loadtxt(path+'Velocyto/raw/unspliced.mtx', skiprows=2, max_rows = 1 ,delimiter=' ')[0:2].astype(int)
    shapeS = np.loadtxt(path+'Velocyto/raw/spliced.mtx', skiprows=2, max_rows = 1 ,delimiter=' ')[0:2].astype(int)
    shapeA = np.loadtxt(path+'Velocyto/raw/ambiguous.mtx', skiprows=2, max_rows = 1 ,delimiter=' ')[0:2].astype(int)

    # Read the sparse matrix with csr_matrix((data, (row_ind, col_ind)), shape=(M, N))
    # Subract -1 to rows and cols index because csr_matrix expects a 0 based index
    # Traspose counts matrix to have Cells as rows and Genes as cols as expected by AnnData objects

    spliced = sparse.csr_matrix((mtxS[:,2], (mtxS[:,0]-1, mtxS[:,1]-1)), shape = shapeS).transpose()
    unspliced = sparse.csr_matrix((mtxU[:,2], (mtxU[:,0]-1, mtxU[:,1]-1)), shape = shapeU).transpose()
    ambiguous = sparse.csr_matrix((mtxA[:,2], (mtxA[:,0]-1, mtxA[:,1]-1)), shape = shapeA).transpose()

    # Load Genes and Cells identifiers
    obs = pd.read_csv(path+'Velocyto/raw/barcodes.tsv',
                  header = None, index_col = 0)

    # Remove index column name to make it compliant with the anndata format
    obs.index.name = None

    var = pd.read_csv(path+'Velocyto/raw/features.tsv', sep='\t',
                                    names = ('gene_ids', 'feature_types'), index_col = 1)

    # Build AnnData object to be used with ScanPy and ScVelo
    adata = anndata.AnnData(X = X, obs = obs, var = var,
                                                 layers = {'spliced': spliced, 'unspliced': unspliced, 'ambiguous': ambiguous})
    adata.var_names_make_unique()

    # Subset Cells based on STAR filtering
    selected_barcodes = pd.read_csv(path+'Gene/filtered/barcodes.tsv', header = None)
    adata = adata[selected_barcodes[0]]

    return adata.copy()
Xin-jxy commented 3 years ago

Thanks to @ftucos, I came up with this new way to implement our spliced, unspliced and ambiguous file to our data, just some very simple codes for the starters.

And many thanks for team STAR for this wonderful development so we don't have to generate a loom file for scVelo.

 import scvelo as scv
 import scanpy as sc
 import sys
 import numpy as np
 from scipy import sparse

def creation_newadata(path):
'''load_adata is a repertory which contains all of our barcodes file, features file and matrix file'''

path=path
adata = sc.read_10x_mtx(path+'Gene/raw/')

spliced=np.loadtxt(path+'Velocyto/raw/spliced.mtx', skiprows=3, delimiter=' ')
shape = np.loadtxt(path+'Velocyto/raw/spliced.mtx', skiprows=2, max_rows = 1 ,delimiter=' ')[0:2].astype(int)

adata.layers['spliced']=sparse.csr_matrix((spliced[:,2], (spliced[:,0]-1, spliced[:,1]-1)), shape = (shape)).tocsr().T

unspliced=np.loadtxt(path+'Velocyto/raw/unspliced.mtx', skiprows=3, delimiter=' ')
adata.layers['unspliced']=sparse.csr_matrix((unspliced[:,2], (unspliced[:,0]-1, unspliced[:,1]-1)), shape = (shape)).tocsr().T

ambiguous= np.loadtxt(path+'Velocyto/raw/ambiguous.mtx', skiprows=3, delimiter=' ')
adata.layers['ambiguous']=sparse.csr_matrix((ambiguous[:,2], (ambiguous[:,0]-1, ambiguous[:,1]-1)), shape = (shape)).tocsr().T

return adata

With this data, we can use it for scanpy calculate as same as scVelo.

Cheers XIN

ZJ-zuojing commented 3 years ago

@Jachimdejonghe Thank you for your workaround. I managed to create a loom file successfully. However, when I load it into R to use it with Seurat, it gives me this error

reading loom file via hdf5r...
Error in `[[.H5File`(f, "col_attrs/CellID") : 
  An object with name col_attrs/CellID does not exist in this group

Any ideas on how to fix this ?

hi,@bapoorva I also have this question,have you fix it?

ZJ-zuojing commented 3 years ago

Hi,@ftucos , Thank you very much, buildAnndataFromStarCurr is very useful for generate anndata,but I need to combind Velocyto spliced/unspliced/ambiguous counts with Seurat data by Velocyto.R. So I need a to generate a loom file like Velocyto produced. Do you have any suggestions? Thanks a lot

ftucos commented 3 years ago

Hi,@ftucos , Thank you very much, buildAnndataFromStarCurr is very useful for generate anndata,but I need to combind Velocyto spliced/unspliced/ambiguous counts with Seurat data by Velocyto.R. So I need a to generate a loom file like Velocyto produced. Do you have any suggestions? Thanks a lot

Hi, @ZJ-zuojing, I would recommend you use an HDF5 viewer to check the loom file generated and check that it's compliant with the loom file specification https://linnarssonlab.org/loompy/format/index.html From your error message, it seems your loom file is missing the "col_attrs" field.

WT215 commented 2 years ago

Any faster loading than np.loadtxt?