GPU memory doesn't enough for the second launch

ValeriiBaidin commented 4 years ago

Hi.

I have gpuLDA model
train1(...) 3....
I want to run train1(..) the second time, but "Julia has exited" or Error with gpu memory.

I tried it several times. How to empty GPU memory before train!(...) or why the model is loading to memory second times?

Thank you in advance.

P.S. I have 4gb GPU, after 1st usgin train 3.2 GPU is been using.

ericproffitt commented 4 years ago

Hi Valerii,

Could you post the code you're running?

Unless more than one GPU model is being instantiated, there shouldn't be any double loading of GPU memory.

ValeriiBaidin commented 4 years ago

Hi Valerii,

Could you post the code you're running?

Unless more than one GPU model is being instantiated, there shouldn't be any double loading of GPU memory.

model = gpuLDA(mc, 10)
train!(model,iter=150,  tol=0) #here 3.2gb gpu is used
showtopics(model, cols=10, 10)
train!(model,iter=1,  tol=0) #here is problem.

ericproffitt commented 4 years ago

I'm able to repeatedly train the full NSF corpus without ever hitting a GPU memory error on my machine. This may be a machine/GPU specific problem, it's difficult to tell.

If you want to try to debug this yourself, the problem is likely occuring in the update_buffer! function starting on line 373 of modelutils.jl.

If you're sure that this is the problem, then is there a way for you to manually clear your GPU memory between runs? This will not affect the model, as everytime train! finishes the data is read back into your CPU RAM, and then when you run train! again, the GPU buffers are reloaded from CPU memory.

postscript,

Actually, one solution might be to remove line 357 from gpuLDA.jl,

all([isempty(doc) for doc in model.corp]) ? (iter = 0) : update_buffer!(model) # line 357

and then run something like the following,

model = gpuLDA(corp, K)

TopicModelsVB.update_buffer!(model)
train!(model) # first time

train!(model) # second time

This way you only run update_buffer! once.

ericproffitt / TopicModelsVB.jl

GPU memory doesn't enough for the second launch #34