OOM when initializing model

viktoriaschuster commented 3 months ago

Dear Dr. Zhang,

I am trying to run your model for a benchmark on paired multi-omics data. Both on my own and your example data I am running into out-of-memory issues when initializing the model. I have two GPUs with 24GB memory each available.

The error occurs in the initialization of TranslateAE (script train_model.py line 338 self.sess.run(tf.global_variables_initializer());).

This is the error message: ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[93283,15172] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[node translator_yx_px_r_genebatch/kernel/Adam_1/Assign (defined at bin/train_model_edited.py:348) ]].

The model seems to create a peak-by-gene matrix for the translator. Is this a desired behavior or might I have missed something in your data preprocessing before running the model?

Kind regards, Viktoria

RanZhang08 commented 3 months ago

Dear Viktoria,

The code can run on our local machines with 8G GPU memory and a batch size 16. Please double-check if there are caches or other running processes that are exhausting the memory.

The translator connects the RNA and ATAC embedding space (i.e., [embed_dim_x, embed_dim_y]) instead of the original feature space. Could you please check if the embed_dim_x and embed_dim_y arguments are passed correctly?

Please let us know if you have any further questions!

Best, Ran

viktoriaschuster commented 3 months ago

Dear Ran,

When I try to run the model no other processes are using up GPU memory.

I have used default parameters except for the embedding dimensions where I match with the benchmarked models embed_dim_x = 20 embed_dim_y = 20

Best, Viktoria

Noble-Lab / Polarbear

OOM when initializing model #5