deeplearning4j / deeplearning4j

Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learn...
http://deeplearning4j.konduit.ai
Apache License 2.0
13.65k stars 3.83k forks source link

Feature Extraction using Pre-Trained VGG16 model taking time #5746

Open DhvananShah-Reflektion opened 6 years ago

DhvananShah-Reflektion commented 6 years ago

I am trying to run image feature extraction on a pre-trained VGG16 model using the model zoo. After loading the model and passing my images through the model, the feed forward step - Map<String, INDArray> stringINDArrayMap = vgg16.feedForward(image, false); is taking roughly ~1000ms for a single image. I just wanted to know if this kind of time is expected for running an image through the feed forward step. Is there anything I am doing wrong, or can change any configurations to increase the resources given to it to decrease execution time.

Code:

// Load the trained model
ZooModel zooModel = VGG16.builder().build();
ComputationGraph vgg16 = (ComputationGraph) zooModel.initPretrained(PretrainedType.VGGFACE);

String path = "<path>/pic.png";
File file = new File(path);
NativeImageLoader loader = new NativeImageLoader(224, 224, 3);
INDArray image = loader.asMatrix(file);
DataNormalization scaler = new VGG16ImagePreProcessor();
scaler.transform(image);

// Feed Forward
long startTime = System.currentTimeMillis();
Map<String, INDArray> stringINDArrayMap = vgg16.feedForward(image, false);
long stopTime = System.currentTimeMillis();

System.out.println("time elapsed : " + (stopTime-startTime));

// Extract fc7  features
INDArray resultfc7 = stringINDArrayMap.get("fc7");
System.out.println("fc7"+resultfc7.shapeInfoToString());

Env : OS : Mac OS X CPU Model : 2.2 GHz Intel Core i7 Blas vendor: [OPENBLAS] o.n.l.a.o.e.DefaultOpExecutioner - Cores: [8]; Memory: [3.6GB];

Aha! Link: https://skymindai.aha.io/features/DL4J-93

raver119 commented 6 years ago

Could you please add some details:

DhvananShah-Reflektion commented 6 years ago

CPU Model : 2.2 GHz Intel Core i7 Is BLAS impl same as the Blas Vendor? Blas Vendor: [OPENBLAS] If this is not the one, how can I find out the BLAS impl?

raver119 commented 6 years ago

full model name please On Thu, Jun 28, 2018 at 13:40 Dhvanan Shah notifications@github.com wrote:

CPU Model : 2.2 GHz Intel Core i7 Is BLAS impl same as the Blas Vendor? Blas Vendor: [OPENBLAS] If this is not the one, how can I find out the BLAS impl?

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/deeplearning4j/deeplearning4j/issues/5746#issuecomment-401166360, or mute the thread https://github.com/notifications/unsubscribe-auth/ALru_0P_wSkTnUX906ENNRYarOn3DDs_ks5uBT9UgaJpZM4U8CM0 .

DhvananShah-Reflektion commented 6 years ago

CPU Model : Intel(R) Core(TM) i7-4770HQ CPU @ 2.20GHz

saudet commented 6 years ago

Sounds about right to me. You could use DL4J 1.0.0-SNAPSHOT and MKL to get a bit better performance though.

AlexDBlack commented 6 years ago

We did some quick timing runs on my PC earlier (Windows + 8 core 5960x @ 4Ghz)... around 700ms for MKL. So we do want to look into this a bit more - might be some scope for improvement.