Open vanhoan310 opened 1 year ago
Thank you for your interest in Matilda. In R, you can read a '.csv' file and convert the matrix into a '.h5' format using the following function:
write_h5 <- function(exprs_list, h5file_list) {
if (length(unique(lapply(exprs_list, rownames))) != 1) {
stop("rownames of exprs_list are not identical.")
}
for (i in seq_along(exprs_list)) {
if (file.exists(h5file_list[i])) {
warning("h5file exists! will rewrite it.")
system(paste("rm", h5file_list[i]))
}
h5createFile(h5file_list[i])
h5createGroup(h5file_list[i], "matrix")
writeHDF5Array(t((exprs_list[[i]])), h5file_list[i], name = "matrix/data")
h5write(rownames(exprs_list[[i]]), h5file_list[i], name = "matrix/features")
h5write(colnames(exprs_list[[i]]), h5file_list[i], name = "matrix/barcodes")
print(h5ls(h5file_list[i]))
}
}
write_h5(exprs_list = list(data = your_matrix), h5file_list = c(saved_path)) # for example, saved_path is "./rna.h5"
When saving your data into an '.h5' format, make sure to replace 'your_matrix' with your actual data and 'saved_path' with the desired file path where you want to save the data.
Hope this help.
Thanks alot! which R library that supports the function writeHDF5Array?
library(HDF5Array)
You can install it using the following commands:
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("HDF5Array")
Dear developers,
How can I use Matilda for training and classifying cell types on CITE-seq data (RNA + ADT)? I followed the tutorial and omitted the ---atac parameter but it does not work.
Thanks!
For Training CITE-seq data:
python main_matilda_train.py --rna ../data/TEAseq/train_rna.h5 --adt ../data/TEAseq/train_adt.h5 --cty ../data/TEAseq/train_cty.csv #Training CITEseq
For classifying CITE-seq data:
python main_matilda_task.py --rna ../data/TEAseq/test_rna.h5 --adt ../data/TEAseq/test_adt.h5 --cty ../data/TEAseq/test_cty.csv --classification True --query True # Classification for CITEseq
Hope this help.
Thanks for your fast reply. I followed your instruction but I got the following error.
Traceback (most recent call last):
File "main_matilda_task.py", line 136, in
Thanks for this, I have updated the codes as
rna_name = h5py.File(rna_data_path,"r")['matrix/features'][:]
if args.adt != "NULL":
adt_name = h5py.File(adt_data_path,"r")['matrix/features'][:]
if args.atac!= "NULL":
atac_name = h5py.File(atac_data_path,"r")['matrix/features'][:]
You can re-download the codes to solve this problem.
It works now. Thanks alot.
Hello, I downloaded the TEA-seq dataset mentioned in your article for replication purposes. However, based on the README file provided, I'm still unsure how to preprocess the data into a format suitable for inputting into the model. Could you please assist me by sharing the steps or procedures you used to prepare the data for model input?Thank you.
I would like to run Matilda (classification). However I don't have the inputs in .h5 format. I only have gene expression matrix (in .csv), a ADT matrix (in .csv), and labels. Is an easy way to run Matilda using these inputs?
Thanks!