Closed dionhaefner closed 7 years ago
Could you attach the kernel source, which is /tmp/bohrium_effe/src/7bfb3319dabe4166.c
in this case.
And also try to insert a flush before and after the code:
def calculate_velocity_on_wgrid(pyom):
np.flush()
pyom.u_wgrid[:,:,:-1] = pyom.u[:,:,1:,pyom.tau] * pyom.maskU[:,:,1:] * 0.5 * pyom.dzt[None,None,1:] / pyom.dzw[None,None,:-1] \
+ pyom.u[:,:,:-1,pyom.tau] * pyom.maskU[:,:,:-1] * 0.5 * pyom.dzt[None,None,:-1] / pyom.dzw[None,None,:-1]
np.flush()
Is this fixed by #229 ?
I'll test tomorrow (I use the nightly PPA).
Seems at least to be fixed on one of my setups, thanks! If the problem should come up again I'll reopen.
Sorry, that was too quick. The code doesn't crash, but the arrays just contain nan
now, while everything should be finite.
Alright, I was able to boil it down to this:
import numpy as np
class PyOM(object):
def __init__(self):
self.nx = 100
self.ny = 250
self.nz = 50
self.tau = 1
self.dzt = np.zeros(self.nz)
self.dzw = np.zeros(self.nz)
self.maskU = np.zeros((self.nx+4, self.ny+4, self.nz))
self.u = np.zeros((self.nx+4, self.ny+4, self.nz, 3))
self.u_wgrid = np.zeros((self.nx+4, self.ny+4, self.nz))
self.v_wgrid = np.zeros((self.nx+4, self.ny+4, self.nz))
self.dzw = 1 + np.random.rand(self.nz)
def calculate_velocity_on_wgrid(pyom):
np.flush()
pyom.u_wgrid[:,:,:-1] = pyom.u[:,:,1:,pyom.tau] * pyom.maskU[:,:,1:] * 0.5 * pyom.dzt[None,None,1:] / pyom.dzw[None,None,:-1] \
+ pyom.u[:,:,:-1,pyom.tau] * pyom.maskU[:,:,:-1] * 0.5 * pyom.dzt[None,None,:-1] / pyom.dzw[None,None,:-1]
np.flush()
if __name__ == "__main__":
pyom = PyOM()
calculate_velocity_on_wgrid(pyom)
print(np.any(np.isnan(pyom.u_wgrid)))
It works as soon as I remove the flush
or either of the allocations (like v_wgrid
, which isn't even used in the code).
The second constant
0.5
in the codeis translated to a
-nan
in the JIT kernel:Unfortunately, I have not been able to reproduce the bug in any other setting. Everything works when I remove either of the arrays, or when I run the code in isolation. The arrays only contain finite values with dtype
float64
.dzw
does not contain any zeros.I figured you guys might have an idea what could cause this. Otherwise, I'll have to dig deeper to try and reproduce the problem.