bobye / neuron

Scala library for neural networks
109 stars 32 forks source link

Improve efficiency on W,b of linear mappings #33

Closed bobye closed 9 years ago

bobye commented 9 years ago

Remove redundant copies if possible

bobye commented 9 years ago
Flat profile of 121.21 secs (10144 total ticks): main

  Interpreted + native   Method                        
 15.7%  1594  +     0    breeze.linalg.operators.DenseVectorOps$$anon$95.apply
  4.3%    21  +   411    com.github.fommil.netlib.NativeSystemBLAS.dgemm_offsets
  2.2%     0  +   223    com.github.fommil.netlib.NativeSystemBLAS.dscal_offsets
  1.7%   171  +     0    breeze.linalg.operators.LowPriorityDenseMatrix$SetDMDMOp.apply
  1.2%   118  +     0    breeze.linalg.operators.DenseVectorOps$$anon$91.apply
  1.1%     0  +   108    com.github.fommil.netlib.NativeSystemBLAS.dcopy_offsets
  0.8%     0  +    84    java.util.zip.ZipFile.open
  0.6%     0  +    58    com.github.fommil.netlib.NativeSystemBLAS.daxpy_offsets
  0.4%    41  +     0    scala.reflect.ManifestFactory$$anon$12.newArray
  0.4%     0  +    41    java.lang.Class.isPrimitive
  0.3%    35  +     0    scala.collection.immutable.VectorBuilder.<init>
  0.3%    26  +     0    java.util.Arrays.copyOfRange
  0.2%    22  +     0    java.lang.ClassLoader.defineClass1
  0.2%    17  +     0    neuron.core.Memorable.<init>
  0.2%    17  +     0    neuron.math.WeightVector.apply
  0.1%     0  +    13    java.lang.Object.getClass
  0.1%    13  +     0    scala.collection.immutable.List.drop
  0.1%    12  +     0    java.lang.Long.toString
  0.1%    12  +     0    breeze.linalg.operators.DenseMatrixOps$$anon$55.apply
  0.1%     7  +     0    breeze.util.ReflectionUtil$.boxedFromPrimitive
  0.1%     1  +     6    java.io.UnixFileSystem.getBooleanAttributes0
  0.1%     6  +     0    neuron.math.WeightVector.get
  0.1%     6  +     0    breeze.linalg.DenseVector$$anon$6.apply
  0.0%     5  +     0    breeze.linalg.operators.DenseMatrixOps$$anon$39.apply
  0.0%     5  +     0    breeze.linalg.DenseMatrix$$anon$25.apply
 32.9%  2355  +   980    Total interpreted (including elided)

     Compiled + native   Method                        
 12.6%     0  +  1279    scala.reflect.ManifestFactory$$anon$12.newArray
 11.5%  1163  +     0    breeze.linalg.operators.DenseMatrixOps$$anon$55.apply
  9.4%   952  +     0    breeze.linalg.operators.LowPriorityDenseMatrix$SetDMDMOp.apply
  4.8%   490  +     0    breeze.linalg.operators.DenseMatrixOps$$anon$26.apply$mcD$sp
  3.9%    51  +   345    breeze.linalg.DenseMatrix$$anon$17.simpleMap
  3.5%   359  +     0    breeze.linalg.operators.DenseMatrixOps$$anon$39.apply
  2.9%   206  +    86    breeze.linalg.NumericOps$class.$times
  0.5%    48  +     0    scala.collection.immutable.List.length
  0.2%    20  +     4    scala.collection.IndexedSeqOptimized$class.foreach
  0.1%     3  +     5    scala.collection.immutable.Range.foreach$mVc$sp
  0.0%     4  +     0    breeze.stats.distributions.ThreadLocalRandomGenerator.nextGaussian
  0.0%     0  +     3    breeze.linalg.DenseMatrix$$anon$10.apply
  0.0%     2  +     0    breeze.linalg.DenseMatrix.$colon$eq
  0.0%     0  +     2    breeze.numerics.package$log$logDoubleImpl$.apply
  0.0%     0  +     2    breeze.linalg.package$.copy
  0.0%     0  +     2    breeze.linalg.DenseMatrix$.zeros$mDc$sp
  0.0%     0  +     1    breeze.linalg.operators.LowPriorityDenseMatrix1$$anon$254$$anonfun$apply$1.apply$mcVI$sp
  0.0%     0  +     1    breeze.linalg.DenseVector.apply
 49.6%  3298  +  1730    Total compiled

         Stub + native   Method                        
  8.6%     0  +   872    com.github.fommil.netlib.NativeSystemBLAS.dgemm_offsets
  6.6%     0  +   668    com.github.fommil.netlib.NativeSystemBLAS.daxpy_offsets
  0.6%     0  +    59    java.lang.Object.getClass
  0.6%     0  +    57    com.github.fommil.netlib.NativeSystemBLAS.dcopy_offsets
  0.5%     0  +    47    sun.misc.Unsafe.compareAndSwapLong
  0.3%     0  +    29    java.lang.Class.isPrimitive
  0.2%     0  +    22    java.lang.Thread.currentThread
  0.1%     0  +     7    java.lang.System.arraycopy
  0.1%     0  +     6    java.lang.Class.getComponentType
  0.1%     0  +     6    sun.misc.Unsafe.getObjectVolatile
  0.0%     0  +     2    java.util.zip.ZipFile.getEntry
  0.0%     0  +     1    sun.misc.Unsafe.compareAndSwapObject
  0.0%     0  +     1    java.lang.Double.doubleToRawLongBits
 17.5%     0  +  1777    Total stub

  Thread-local ticks:
  0.0%     4             Class loader

Global summary of 121.21 seconds:
100.0% 10364             Received ticks
  1.9%   193             Received GC ticks
  1.7%   175             Compilation
  0.0%     1             Deoptimization
  0.3%    27             Other VM operations
  0.0%     4             Class loader
bobye commented 9 years ago

12.6% 0 + 1279 scala.reflect.ManifestFactory$$anon$12.newArray

memory allocation is slow

bobye commented 9 years ago

6.9% 0 + 633 scala.reflect.ManifestFactory$$anon$12.newArray

bobye commented 9 years ago
Flat profile of 96.67 secs (8149 total ticks): main

  Interpreted + native   Method                        
 14.3%  1169  +     0    breeze.linalg.operators.DenseVectorOps$$anon$95.apply
  7.8%    22  +   612    com.github.fommil.netlib.NativeSystemBLAS.dgemm_offsets
  4.3%     0  +   354    com.github.fommil.netlib.NativeSystemBLAS.dscal_offsets
  1.0%     0  +    82    com.github.fommil.netlib.NativeSystemBLAS.daxpy_offsets
  0.9%     0  +    75    java.io.FileInputStream.readBytes
  0.9%     0  +    75    com.github.fommil.netlib.NativeSystemBLAS.dcopy_offsets
  0.5%     0  +    44    java.util.zip.ZipFile.open
  0.4%    35  +     0    java.util.Arrays.copyOfRange
  0.4%    30  +     0    scala.collection.immutable.VectorBuilder.<init>
  0.4%    29  +     0    scala.reflect.ManifestFactory$$anon$12.newArray
  0.3%    27  +     0    scala.collection.immutable.List.drop
  0.2%     0  +    20    java.lang.Object.getClass
  0.2%    18  +     0    java.lang.ClassLoader.defineClass1
  0.2%     0  +    17    java.lang.Class.isPrimitive
  0.2%    16  +     0    neuron.math.WeightVector.get
  0.2%    16  +     0    neuron.core.Memorable.<init>
  0.1%     0  +    11    sun.misc.Unsafe.compareAndSwapLong
  0.1%    10  +     0    breeze.linalg.DenseVector$$anon$6.apply
  0.1%     0  +     9    java.io.UnixFileSystem.getBooleanAttributes0
  0.1%     8  +     0    neuron.core.InstanceOfSingleLayerNeuralNetwork.allocate
  0.1%     8  +     0    scala.collection.mutable.HashTable$class.$init$
  0.1%     8  +     0    breeze.linalg.operators.DenseMatrixOps$$anon$55.apply
  0.1%     0  +     6    java.lang.System.currentTimeMillis
  0.1%     5  +     0    neuron.math.WeightVector.apply
  0.1%     5  +     0    breeze.linalg.operators.DenseMatrixOps$$anon$39.apply
 35.8%  1581  +  1335    Total interpreted (including elided)

     Compiled + native   Method                        
 10.2%   829  +     0    breeze.linalg.operators.DenseMatrixOps$$anon$55.apply
  8.8%     0  +   720    scala.reflect.ManifestFactory$$anon$12.newArray
  6.2%   206  +   297    breeze.linalg.DenseMatrix$$anon$17.simpleMap
  5.4%   444  +     0    breeze.linalg.operators.DenseMatrixOps$$anon$39.apply
  3.5%   285  +     0    breeze.linalg.operators.DenseMatrixOps$$anon$26.apply$mcD$sp
  2.8%     0  +   225    breeze.linalg.operators.DenseMatrixMultiplyStuff$implOpMulMatrix_DMD_DMD_eq_DMD$.apply
  0.4%    30  +     0    scala.collection.immutable.List.length
  0.3%    19  +     5    scala.collection.IndexedSeqOptimized$class.foreach
  0.1%     1  +     6    scala.collection.immutable.Range.foreach$mVc$sp
  0.0%     4  +     0    breeze.linalg.operators.DenseMatrixOps$$anon$47.apply
  0.0%     3  +     0    breeze.stats.distributions.ThreadLocalRandomGenerator.nextGaussian
  0.0%     1  +     0    breeze.stats.distributions.RandBasis$$anon$6.get$mcD$sp
  0.0%     0  +     1    scala.runtime.BoxesRunTime.boxToDouble
  0.0%     1  +     0    breeze.linalg.operators.LowPriorityDenseMatrix$SetDMDMOp.apply
  0.0%     0  +     1    breeze.linalg.NumericOps$class.$minus
  0.0%     0  +     1    breeze.linalg.package$.copy
  0.0%     0  +     1    breeze.linalg.DenseMatrix$$anon$10.apply
  0.0%     1  +     0    scala.collection.LinearSeqOptimized$class.length
  0.0%     0  +     1    breeze.linalg.argmax$$anon$2$SumVisitor$2.visitArray
  0.0%     1  +     0    breeze.linalg.DenseMatrix.toArray
  0.0%     1  +     0    breeze.linalg.operators.DenseMatrixOps$$anon$43.apply
 37.8%  1826  +  1258    Total compiled

         Stub + native   Method                        
 12.5%     0  +  1015    com.github.fommil.netlib.NativeSystemBLAS.daxpy_offsets
 12.1%     0  +   989    com.github.fommil.netlib.NativeSystemBLAS.dgemm_offsets
  0.5%     0  +    42    com.github.fommil.netlib.NativeSystemBLAS.dcopy_offsets
  0.3%     0  +    28    java.lang.Class.isPrimitive
  0.3%     0  +    28    java.lang.Thread.currentThread
  0.1%     0  +    11    java.lang.Object.getClass
  0.1%     0  +     9    sun.misc.Unsafe.getObjectVolatile
  0.1%     0  +     7    sun.misc.Unsafe.compareAndSwapLong
  0.1%     0  +     6    java.lang.System.arraycopy
  0.0%     0  +     3    sun.misc.Unsafe.compareAndSwapObject
  0.0%     0  +     2    java.lang.Double.doubleToRawLongBits
  0.0%     0  +     2    java.util.zip.ZipFile.getEntry
 26.3%     0  +  2142    Total stub

  Thread-local ticks:
  0.1%     7             Class loader

Global summary of 96.67 seconds:
100.0%  8292             Received ticks
  1.4%   116             Received GC ticks
  3.6%   300             Compilation
  0.3%    27             Other VM operations
  0.1%     7             Class loader
bobye commented 9 years ago
Flat profile of 104.71 secs (8982 total ticks): main

  Interpreted + native   Method                        
 14.9%  1334  +     0    breeze.linalg.operators.DenseVectorOps$$anon$95.apply
  7.9%    31  +   683    com.github.fommil.netlib.NativeSystemBLAS.dgemm_offsets
  2.5%     0  +   226    com.github.fommil.netlib.NativeSystemBLAS.dscal_offsets
  1.4%     0  +   123    java.io.FileInputStream.readBytes
  1.1%     0  +    95    com.github.fommil.netlib.NativeSystemBLAS.daxpy_offsets
  0.9%     0  +    77    com.github.fommil.netlib.NativeSystemBLAS.dcopy_offsets
  0.5%    43  +     0    scala.collection.immutable.List.drop
  0.5%    43  +     0    scala.reflect.ManifestFactory$$anon$12.newArray
  0.5%     0  +    42    java.util.zip.ZipFile.open
  0.5%    42  +     0    java.util.Arrays.copyOfRange
  0.4%    33  +     0    scala.collection.immutable.VectorBuilder.<init>
  0.2%    17  +     2    java.lang.ClassLoader.defineClass1
  0.2%    18  +     0    breeze.linalg.operators.LowPriorityDenseMatrix$SetDMDMOp.apply
  0.2%     0  +    17    java.lang.System.currentTimeMillis
  0.2%     0  +    17    java.lang.Object.getClass
  0.1%     0  +    11    java.lang.Class.isPrimitive
  0.1%     9  +     0    java.lang.AbstractStringBuilder.<init>
  0.1%     7  +     0    neuron.math.WeightVector.apply
  0.1%     0  +     7    java.io.UnixFileSystem.getBooleanAttributes0
  0.1%     6  +     0    scala.collection.immutable.Vector.appendBack
  0.1%     5  +     0    breeze.linalg.DenseVector$$anon$6.apply
  0.1%     5  +     0    breeze.linalg.DenseMatrix.$plus
  0.1%     5  +     0    breeze.linalg.DenseMatrix$$anon$13.apply
  0.1%     0  +     5    sun.misc.Unsafe.compareAndSwapLong
  0.1%     5  +     0    breeze.linalg.operators.DenseMatrixOps$$anon$55.apply
 35.7%  1876  +  1328    Total interpreted (including elided)

     Compiled + native   Method                        
 10.9%   979  +     0    breeze.linalg.operators.DenseMatrixOps$$anon$55.apply
  6.1%     0  +   550    scala.reflect.ManifestFactory$$anon$12.newArray
  5.4%   181  +   305    breeze.linalg.DenseMatrix$$anon$17.simpleMap
  4.5%     0  +   406    breeze.linalg.operators.DenseMatrixMultiplyStuff$implOpMulMatrix_DMD_DMD_eq_DMD$.apply
  3.2%   285  +     0    breeze.linalg.operators.DenseMatrixOps$$anon$39.apply
  1.6%   148  +     0    breeze.linalg.operators.DenseMatrixOps$$anon$26.apply$mcD$sp
  1.4%   128  +     0    breeze.linalg.operators.LowPriorityDenseMatrix$SetDMDMOp.apply
  0.3%    20  +     4    scala.collection.IndexedSeqOptimized$class.foreach
  0.2%    21  +     0    scala.collection.immutable.List.length
  0.0%     0  +     4    breeze.linalg.DenseMatrix$$anon$10.apply
  0.0%     0  +     4    scala.collection.immutable.Range.foreach$mVc$sp
  0.0%     3  +     0    scala.collection.LinearSeqOptimized$class.length
  0.0%     3  +     0    breeze.stats.distributions.RandBasis$$anon$6.get$mcD$sp
  0.0%     0  +     2    breeze.numerics.package$exp$expDoubleImpl$.apply
  0.0%     1  +     0    scala.Array$.fill
  0.0%     0  +     1    breeze.linalg.operators.DenseVectorOps$$anon$123.apply
  0.0%     0  +     1    scala.runtime.BoxesRunTime.boxToDouble
  0.0%     0  +     1    breeze.linalg.DenseMatrix.$colon$times
  0.0%     0  +     1    breeze.linalg.package$.copy
  0.0%     0  +     1    breeze.linalg.sum$$anon$2.apply
  0.0%     0  +     1    breeze.linalg.DenseMatrix$$anon$17.map
 34.0%  1769  +  1281    Total compiled

         Stub + native   Method                        
 17.1%     0  +  1536    com.github.fommil.netlib.NativeSystemBLAS.dgemm_offsets
 10.5%     0  +   942    com.github.fommil.netlib.NativeSystemBLAS.daxpy_offsets
  1.2%     0  +   110    java.lang.Object.getClass
  0.4%     0  +    32    sun.misc.Unsafe.compareAndSwapLong
  0.3%     0  +    31    com.github.fommil.netlib.NativeSystemBLAS.dcopy_offsets
  0.3%     0  +    26    java.lang.Class.isPrimitive
  0.2%     0  +    21    java.lang.Thread.currentThread
  0.1%     0  +    12    java.lang.System.arraycopy
  0.1%     0  +     8    sun.misc.Unsafe.getObjectVolatile
  0.0%     0  +     2    java.lang.Class.getComponentType
  0.0%     0  +     2    java.lang.Double.doubleToRawLongBits
  0.0%     0  +     1    sun.misc.Unsafe.compareAndSwapObject
  0.0%     0  +     1    java.util.zip.ZipFile.getEntry
 30.3%     0  +  2724    Total stub

  Thread-local ticks:
  0.0%     4             Class loader

Global summary of 104.71 seconds:
100.0%  9096             Received ticks
  1.0%    95             Received GC ticks
  1.5%   132             Compilation
  0.2%    19             Other VM operations
  0.0%     4             Class loader
bobye commented 9 years ago
Flat profile of 99.07 secs (8681 total ticks): main

  Interpreted + native   Method                        
 11.4%   987  +     0    breeze.linalg.operators.DenseVectorOps$$anon$95.apply
  6.7%    22  +   561    com.github.fommil.netlib.NativeSystemBLAS.dgemm_offsets
  5.4%     0  +   469    com.github.fommil.netlib.NativeSystemBLAS.dscal_offsets
  1.3%     0  +   113    java.io.FileInputStream.readBytes
  1.0%     0  +    84    com.github.fommil.netlib.NativeSystemBLAS.daxpy_offsets
  0.7%     0  +    57    com.github.fommil.netlib.NativeSystemBLAS.dcopy_offsets
  0.6%     0  +    54    java.util.zip.ZipFile.open
  0.5%    43  +     0    scala.reflect.ManifestFactory$$anon$12.newArray
  0.5%     0  +    40    java.io.FileInputStream.read0
  0.4%    33  +     0    scala.collection.immutable.List.drop
  0.3%    26  +     0    java.util.Arrays.copyOfRange
  0.3%     0  +    25    java.lang.Object.getClass
  0.3%     0  +    23    sun.misc.Unsafe.compareAndSwapLong
  0.3%    22  +     0    scala.collection.immutable.VectorBuilder.<init>
  0.2%     0  +    18    java.lang.Class.isPrimitive
  0.2%    17  +     0    neuron.math.WeightVector.get
  0.2%    14  +     3    java.lang.ClassLoader.defineClass1
  0.2%     0  +    15    java.lang.System.currentTimeMillis
  0.2%    15  +     0    java.lang.Long.toString
  0.1%    10  +     0    breeze.linalg.operators.DenseMatrixOps$$anon$55.apply
  0.1%     9  +     0    neuron.math.WeightVector.apply
  0.1%     8  +     0    neuron.core.InstanceOfSingleLayerNeuralNetwork.allocate
  0.1%     8  +     0    neuron.core.Memorable.<init>
  0.1%     7  +     0    breeze.linalg.operators.LowPriorityDenseMatrix1$$anon$254.apply
  0.1%     7  +     0    breeze.linalg.DenseVector$$anon$6.apply
 34.6%  1504  +  1499    Total interpreted (including elided)

     Compiled + native   Method                        
 16.2%  1407  +     1    breeze.linalg.operators.DenseMatrixOps$$anon$55.apply
  5.7%     0  +   498    scala.reflect.ManifestFactory$$anon$12.newArray
  5.4%   163  +   310    breeze.linalg.DenseMatrix$$anon$17.simpleMap
  4.8%   415  +     0    breeze.linalg.operators.DenseMatrixOps$$anon$26.apply$mcD$sp
  4.2%   364  +     0    breeze.linalg.operators.DenseMatrixOps$$anon$39.apply
  2.4%     0  +   209    breeze.linalg.DenseMatrix.$times
  0.4%    34  +     0    scala.collection.immutable.List.length
  0.3%    20  +     4    scala.collection.IndexedSeqOptimized$class.foreach
  0.1%     7  +     6    scala.collection.immutable.Range.foreach$mVc$sp
  0.1%     5  +     0    scala.collection.LinearSeqOptimized$class.length
  0.0%     4  +     0    breeze.stats.distributions.ThreadLocalRandomGenerator.nextGaussian
  0.0%     3  +     0    breeze.linalg.DenseMatrix.$colon$eq
  0.0%     0  +     2    breeze.linalg.DenseMatrix$$anon$10.apply
  0.0%     0  +     2    java.lang.Object.<init>
  0.0%     2  +     0    breeze.linalg.operators.LowPriorityDenseMatrix$SetDMDMOp.apply
  0.0%     1  +     0    neuron.core.InstanceOfMergedNeuralNetwork.setWeights
 39.8%  2425  +  1032    Total compiled

         Stub + native   Method                        
 13.0%     0  +  1130    com.github.fommil.netlib.NativeSystemBLAS.dgemm_offsets
 10.7%     0  +   933    com.github.fommil.netlib.NativeSystemBLAS.daxpy_offsets
  0.6%     0  +    56    com.github.fommil.netlib.NativeSystemBLAS.dcopy_offsets
  0.4%     0  +    39    java.lang.Thread.currentThread
  0.4%     0  +    31    java.lang.Class.isPrimitive
  0.1%     0  +    10    java.lang.Object.getClass
  0.1%     0  +     8    java.lang.System.arraycopy
  0.1%     0  +     7    sun.misc.Unsafe.getObjectVolatile
  0.0%     0  +     2    sun.misc.Unsafe.compareAndSwapObject
  0.0%     0  +     1    java.lang.Double.doubleToRawLongBits
  0.0%     0  +     1    java.util.zip.ZipFile.getEntry
 25.6%     0  +  2218    Total stub

  Thread-local ticks:
  0.0%     3             Class loader

Global summary of 99.08 seconds:
100.0%  8855             Received ticks
  1.6%   143             Received GC ticks
  3.6%   315             Compilation
  0.4%    31             Other VM operations
  0.0%     3             Class loader