Open yang182 opened 5 years ago
I've got a similar problem using C++ with CUDA V 10.2. In my case the forward step works fine and I get a loss value which looks OK. But in the backward step there is an Exception telling: * On entry to SGEMM parameter number 8 had an illegal value CUBLAS failure in cublasSgemm(dev.cublas_handle, CUBLAS_OP_N, CUBLAS_OP_T, y.d.rows(), y.d.cols(), l.d.cols() l.d.batch_elems(), dev.kSCALAR_ONE, l.v, l.d.rows(), r.v, r.d.rows(), dev.kSCALAR_ONE, y.v, y.d.rows()) 7
This only occurs with dynet-autobatch turned on. Without autobatch it works fine.
Any ideas someone?
import numpy as np import dynet_config dynet_config.set_gpu()
import dynet as dy from optparse import OptionParser parser = OptionParser() parser.add_option("--dynet-gpu", action="store_true") class OurNetwork(object): def init(self, pc): self.pW = pc.add_parameters((10,30)) self.pB = pc.add_parameters(10) self.lookup = pc.add_lookup_parameters((500,10))
dy.init() dy.renew_cg()
m = dy.Model()
network = OurNetwork(m)
trainer = dy.SimpleSGDTrainer(m)
for epoch in range(50): for inp,lbl in (([1,2,3],1), ([3,2,4],2)): loss = network.create_network_return_loss(inp, lbl) loss.value() loss.backward() trainer.update() print(loss.value()) # need to run loss.value() for the forward prop
print('Predicted smallest element among {} is {}:'.format([1,2,3], network.create_network_return_best([1,2,3])))
==========================out info================= [dynet] initializing CUDA [dynet] CUDA driver/runtime versions are 10.0/10.0 Request for 1 GPU ... [dynet] Device Number: 0 [dynet] Device name: GeForce RTX 2080 Ti [dynet] Memory Clock Rate (KHz): 7000000 [dynet] Memory Bus Width (bits): 352 [dynet] Peak Memory Bandwidth (GB/s): 616 [dynet] Memory Free (GB): 1.65767/11.5229 [dynet] [dynet] Device(s) selected: 0 [dynet] random seed: 3237171193 [dynet] allocating memory: 512MB [dynet] memory allocation done. WARNING: Attempting to initialize dynet twice. Ignoring duplicate initialization. CUBLAS failure in cublasSgemm(dev.cublas_handle, CUBLAS_OP_N, CUBLAS_OP_N, y.d.rows(), y.d.cols() y.d.batch_elems(), l.d.cols(), dev.kSCALAR_ONE, l.v, l.d.rows(), r.v, r.d.rows(), acc_scalar, y.v, y.d.rows()) 13 Traceback (most recent call last): File "/data/jupyter/hongchao/text2structure/model2note/model/test_dy_gpu.py", line 57, in
loss.value()
File "_dynet.pyx", line 769, in _dynet.Expression.value
File "_dynet.pyx", line 783, in _dynet.Expression.value
RuntimeError: cublasSgemm(dev.cublas_handle, CUBLAS_OP_N, CUBLAS_OP_N, y.d.rows(), y.d.cols() y.d.batch_elems(), l.d.cols(), dev.kSCALAR_ONE, l.v, l.d.rows(), r.v, r.d.rows(), acc_scalar, y.v, y.d.rows())
Process finished with exit code 1