ejlee95 / Graph-based-TSR

7 stars 3 forks source link

trouble reproducing experience #4

Open SomathSatou opened 8 months ago

SomathSatou commented 8 months ago

Hello, I've had a few problems reproducing your experience with your code.

I get a train error message: absl.flags._exceptions.IllegalFlagValueError: flag --num_epochs=EPOCHS: invalid literal for int() with base 10: 'EPOCHS'.

and this in the test : tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1,128,1075,761] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc

Do you have any advice for me? Thank you for your understanding.

ejlee95 commented 8 months ago

Firstly, when running the training code, you must feed an integer value to the --num_epochs parameter. It is set to 1000 as a default option. I recommend you to try with a commend "--num_epochs 1000".

Secondly, the error is an out-of-memory error, which means the required GPU memory usage exceeds its capacity. Please check your computer's GPU memory. I can run the test code on one NVIDIA GeForce GTX 1060 6GB.

SomathSatou commented 8 months ago

Could you provide more information about your system, such as operating system version, RAM or processor?

Does your project still work on your machine if you use a new environment? We've had problems with your graphtsr.yml and have made a few changes, so maybe that's where our problem lies.

ouy GPU is Quadro RTX 3000 6GB.