CDECatapult / ml-performance-prediction

Code that accompanies the paper "Predicting the Computational Cost of Deep Learning Models"
Apache License 2.0
20 stars 11 forks source link

Generate_train_data problem #1

Open s9013xx opened 4 years ago

s9013xx commented 4 years ago

Hi danjust

I am NTU student in Taiwan. Recently I study in NN model predict research and I found your paper. I think it's a good solution to predict inference and train time.

I guess first step is collect data, so I execute below command: python ml-performance-prediction/prediction_model/Generate_train_data/benchmark.py --testConv And I always got 'Error: Out of GPU memory', it because 'sess.run(tf.global_variables_initializer())' not work in function 'run_benchmark'. So, I think maybe it is tensorflow version is not match. I used tensorflow==1.12.0 Could you please tell me you tensorflow version? Thanks you!

Castdeath97 commented 4 years ago

Hi @s9013xx,

Sorry for the late response.

I worked on the same project for my research as well, data generation needs to be done using tensorflow's docker 1.10.1-gpu image to avoid issues. Also, the parameters used to generate data in this project are extreme (they were originally intended to run for long weeks), hence I'd recommend you reduce the values of following parameters (in the arguments or the benchmark scripts themselves) to stop memory issues:

I'd also recommend you take a look at earlier commits and the mlpredict repository (https://github.com/CDECatapult/mlpredict) if you want to see the model itself.

If you need any other help feel free to contact me, but I don't guarantee responses since I'm busy with other commitments.