Closed Dario-Rocha closed 1 year ago
Currently, there is no way to choose to read the gene symbols from within BPCells and it will always read the gene IDs (ENSG) for the row names. However, there are two options that are available to you
canonical_gene_symbol()
in BPCells, which can translate most gene IDs into their corresponding canonical symbol as defined by HGNC. I think not every gene in the 10x matrices has a canonical symbol, e.g. for some of the lncRNAs, but it should cover everything with a canonical name as of late 2022.rownames(bpcells_mat) <- gene_symbols
to change the rownames on the BPCells object (you'll want to make sure you write the matrix to disk after setting the rownames). To do this in R, I'd recommend the hdf5r
package. From the 10x documentation, the path in the 10x file you'd want is features/name
Hope one of those works for you! It might require reading a bit of documentation, but I highly recommend the hdf5r
package, and hopefully it will be clear how to use -- the simple example on their github page is quite good. It's very useful to be able to dig around in hdf5 files yourself (the h5ls
command line tool is also handy for looking at hdf5 file structure)
Thank you for your help, I think I've made a lot of progress in understanding the idea behind h5 files and BPCells structure. I've managed to get the desired gene symbols from the h5 file, but I am failing to save the modified matrix. When using teh code below, the rownames will be the desired ones at the temp_data object in R, but when saving it with BPCells and reloading it, the rownames are the original ones. I am sorry if this is something quite basic, I've gone through the hdf5r and BPCells documentation and I can't really understand how to do this right.
#get cellranger rownames from h5 file
temp_h5 <- H5File$new(temp_file, mode = 'r') #creates access to the file
temp_symbol <- temp_h5[['matrix']][['features']][['name']][] #extract the gene names
###load 10x data with BPcells----
temp_data <- open_matrix_10x_hdf5(temp_file, feature_type="Gene Expression")
rownames(temp_data) <- temp_symbol
write_matrix_dir(mat = temp_data, dir = temp_matrix_dir, overwrite = TRUE)
temp_data <- open_matrix_dir(temp_matrix_dir)
This looks like you're hitting issue #29, so I think it should be fixed if you re-install BPCells.
Reinstalling BPCells solved this issue and another issue I was having when trying to work with h5 files generated by SoupX
Great! I'll mark this as completed then
Hello again,
When loading an .h5 file with Seurat function Read10X_h5, we can choose which annotation object to use for rownames, and, for example, use gene symbol instead of ENSG. Instead, when reading an .h5 file with BPCells function open_matrix_10x_hdf5, the matrix is loaded with ENSG as rownames. I am not familiar with .h5 files, and I can't find a way to read the desired gene symbol as rownames when loading the matrix with BPCells package