Closed Dana162001 closed 8 months ago
UPDATE I also tried to run it on more powerful cluster:
And I got OOM error: tensorflow.python.framework.errors_impl.ResourceExhaustedError: {{function_node wrappedSelfAdjointEigV2device/job:localhost/replica:0/task:0/device:CPU:0}} OOM when allocating tensor with shape[217184,491,491] and type double on /job:localhost/replica:0/task:0/device:CPU:0 by allocator mklcpu [Op:SelfAdjointEigV2]
I also subsetted my datasets almost twice. Unfortunately I am afraid that further subseting will lead to loosing of biological meaning. If someone also has similar problem, I would be very grateful for any suggestions on how to solve it :)
Hi!
The OOM error comes from the computation of the COVET matrices based on 492 genes. We've now updated ENVI and unless specified, should only base COVET on the 64 highly variables genes.
Please try again with the new version and let us now if your still having errors.
Hi, thank you for your answer. I updated envi and tried again. But unfortunately, my problem remains the same:
at the Computing Niche Covariance Matrices step:
Traceback (most recent call last):
File ".../enVI/test.py", line 9, in
Hi!
If you have already run highly variable genes (sc.pp.highly_variable_genes) ENVI by default uses those genes as the basis for COVET. Is it possible that you already have all genes in your st_data marked as HVGs? If so, can you first re-run sc.pp.highly_variable_genes with a higher threshold or less genes and try again?
Yes, I did run sc.pp.highly_variable_genes and I only selected top=1000 but the error is the same no matter if I use 1000, 3000 or 10 000 genes :( Should I subset further? Do you have an approximate value of how many genes can be imputed? And also will splitting reference scRNA data into smaller objects and then running in the loop one by one make sense biologically speaking?
Thank you in advance for your answer! :)
We mean sc.pp.highly_variable_genes on the spatial data (st_data) in your case, not the scRNA-seq. For the scRNA-seq even with 3,000 genes you should see no issue. If you ran sc.pp.highly_variable_genes on the st_data and selected the top 1,000 it would just select all the genes (since you only have 492 genes in the dataset). Before running ENVI, first try running:
sc.pp.highly_variable_genes(st_data, n_top_genes = 64, layer = 'log')
Also, make sure the data is not logged (in the .X), since ENVI expected unlogged counts.
I see, however, I think it wasn't the problem because I did not run hvg on my spatial dataset before. But after I converted both sc_data.X and st_data.X from sparse matrix to np array it seemed to work no matter if I ran highly_variable_genes function or not. With it now I can close the error, thank you for your help and the great update. I hope we will do a lot of nice analyses with envi :)
Hi, thank you for creating such a great tool! It works perfectly good with test data, but when I try to run it with my data, I got this memory allocating error. Do you maybe have any suggestions on how to run envi for bigger datasets?
Code: ENVI_Model = ENVI.ENVI(spatial_data = st_data, sc_data = sc_data)
Error: numpy.core._exceptions.MemoryError: Unable to allocate 392. GiB for an array with shape (217184, 492, 492) and data type float64
Sbatch job parameters:
SBATCH --job-name=enVI
SBATCH --output=logs/test-%j.out
SBATCH --error=logs/test-%j.err
SBATCH --time=05:00:00
SBATCH --gres=gpu:1
SBATCH --mem=180G
SBATCH --partition=c18g
SBATCH --cpus-per-task=30
SBATCH --signal=2
SBATCH --nodes=1
SBATCH --export=ALL