Closed TheBarret closed 9 months ago
Hi,
could you please start ALIEN with the command line argument -debug
and then posting the log.txt after the crash occurred?
Hi, could you please start ALIEN with the command line argument
-debug
and then posting the log.txt after the crash occurred?
I will report here as soon as I have sufficient (debug) because I dont know what triggers it, meanwhile i keep that flag on so when it does, i can convey that to you.
Alright! The simulation runs a bit slower (~15%) with this flag on.
Alright! The simulation runs a bit slower (~15%) with this flag on.
2023-09-20 19-03-30: device 0 is set
2023-09-20 19-03-30: initialize simulation
2023-09-20 19-03-32: resize arrays
2023-09-20 19-03-32: cell array size: 300000
2023-09-20 19-03-32: particle array size: 300000
2023-09-20 19-03-32: auxiliary data size: 300000
2023-09-20 19-03-32: 710 MB GPU memory used
2023-09-20 19-03-46: resize arrays
2023-09-20 19-03-46: cell array size: 300000
2023-09-20 19-03-46: particle array size: 300000
2023-09-20 19-03-46: auxiliary data size: 900000
2023-09-20 19-03-46: 723 MB GPU memory used
2023-09-20 19-03-49: resize arrays
2023-09-20 19-03-49: cell array size: 300000
2023-09-20 19-03-49: particle array size: 300000
2023-09-20 19-03-49: auxiliary data size: 2700000
2023-09-20 19-03-49: 728 MB GPU memory used
2023-09-20 19-14-33: CUDA error. Location: Base.cuh:225 code=719(cudaErrorLaunchFailure) "cudaMemcpy(&result, source, sizeof(T), cudaMemcpyDeviceToHost)"
2023-09-20 19-14-33: CUDA error. Location: CudaSimulationFacade.cu:152 code=46(cudaErrorDevicesUnavailable) "cudaGraphicsMapResources(1, &cudaResourceImpl)"
2023-09-20 19-14-33: network: logout
2023-09-20 19-14-34: close simulation
It doesn't look like the log is from a debug mode, since the cudaErrorLaunchFailure error is not generated by cudaMemcpy, but by a previous kernel call. When the debug mode is enabled, there is a synchronization point after each kernel call and an error checking. Only then the error information from the log is useful.
You can check if ALIEN is in debug mode, if after starting via alien.exe -debug
(on Windows) the loading screen shows DEBUG
.
It doesn't look like the log is from a debug mode, since the cudaErrorLaunchFailure error is not generated by cudaMemcpy, but by a previous kernel call. When the debug mode is enabled, there is a synchronization point after each kernel call and an error checking. Only then the error information from the log is useful.
You can check if ALIEN is in debug mode, if after starting via
alien.exe -debug
(on Windows) the loading screen showsDEBUG
.
Oh, I dont know what else to give you, I did you use that -debug
flag and this is all it gives:
2023-09-22 12-42-08: CUDA error. Location: SimulationKernelsLauncher.cu:86 code=719(cudaErrorLaunchFailure) "cudaGetLastError()"
2023-09-22 12-42-08: CUDA error. Location: CudaSimulationFacade.cu:152 code=46(cudaErrorDevicesUnavailable) "cudaGraphicsMapResources(1, &cudaResourceImpl)"
Ok, that looks better. On the Discord server you wrote it occurs after multiplication of structure during a running simulation? Can you please give more details. Can you give an instruction to reproduce the bug?
Ok, that looks better. On the Discord server you wrote it occurs after multiplication of structure during a running simulation? Can you please give more details. Can you give an instruction to reproduce the bug?
The bug occurs 99% of the time when I add random spores and at the moment of the sim starting and begin to replicate the spores (few seconds into the run) it exits the app, this happens usually 3 times in a row and interestingly after that no crashes anymore, I can run your sim for hours (days probably).
Also helps maybe to inform you of my specs. I run a bulldozer AMD (FX8350) with 20GB ram and a Lightweight GFX Geforce 1050Ti (4GB)
Ok, I'm still not able to reproduce the crash. Maybe I have not yet understood the precise steps. I guess the following steps from the description:
Is that correct?
Could it be memory issue, because this GFX card does not have plenty of it , maximum of 4gb. because if i push my GPU little too hard on the Stable Diffussion it too succumbs to insufficient memory (SD 2.0XL for instance is a no go)
Yeah it seems when I use the multiplicative tool, it gave me an error as message box.
2023-09-25 13-00-18: DEBUG mode
2023-09-25 13-00-18: set windowed mode
2023-09-25 13-00-18: starting ALIEN v4.3.0
2023-09-25 13-00-19: network: login user 'TheBarret'
2023-09-25 13-00-19: network: get simulation list
2023-09-25 13-00-19: network: get user list
2023-09-25 13-00-19: network: get liked simulations
2023-09-25 13-00-24: 1 CUDA device found
2023-09-25 13-00-24: device 0: NVIDIA GeForce GTX 1050 Ti with compute capability 6.1
2023-09-25 13-00-24: device 0 is set
2023-09-25 13-00-24: initialize simulation
2023-09-25 13-00-26: resize arrays
2023-09-25 13-00-26: cell array size: 300000
2023-09-25 13-00-26: particle array size: 300000
2023-09-25 13-00-26: auxiliary data size: 300000
2023-09-25 13-00-26: 707 MB GPU memory used
2023-09-25 13-00-26: resize arrays
2023-09-25 13-00-26: cell array size: 300000
2023-09-25 13-00-26: particle array size: 300000
2023-09-25 13-00-26: auxiliary data size: 4520073
2023-09-25 13-00-26: 719 MB GPU memory used
2023-09-25 13-00-33: close simulation
2023-09-25 13-00-33: device 0 is set
2023-09-25 13-00-33: initialize simulation
2023-09-25 13-00-35: resize arrays
2023-09-25 13-00-35: cell array size: 300000
2023-09-25 13-00-35: particle array size: 300000
2023-09-25 13-00-35: auxiliary data size: 300000
2023-09-25 13-00-35: 707 MB GPU memory used
2023-09-25 13-04-45: resize arrays
2023-09-25 13-04-45: cell array size: 300000
2023-09-25 13-04-45: particle array size: 300000
2023-09-25 13-04-45: auxiliary data size: 900000
2023-09-25 13-04-45: 719 MB GPU memory used
2023-09-25 13-06-09: message dialog showing: 'Non-overlapping copies could not be created.'
2023-09-25 13-06-09: resize arrays
2023-09-25 13-06-09: cell array size: 300000
2023-09-25 13-06-09: particle array size: 300000
2023-09-25 13-06-09: auxiliary data size: 42000312
2023-09-25 13-06-09: 837 MB GPU memory used
2023-09-25 13-06-16: message dialog showing: 'Non-overlapping copies could not be created.'
2023-09-25 13-20-28: network: refresh login
2023-09-25 13-21-31: close simulation
2023-09-25 13-21-31: device 0 is set
2023-09-25 13-21-31: initialize simulation
2023-09-25 13-21-33: resize arrays
2023-09-25 13-21-33: cell array size: 300000
2023-09-25 13-21-33: particle array size: 300000
2023-09-25 13-21-33: auxiliary data size: 300000
2023-09-25 13-21-33: 707 MB GPU memory used
2023-09-25 13-22-28: resize arrays
2023-09-25 13-22-28: cell array size: 300000
2023-09-25 13-22-28: particle array size: 300000
2023-09-25 13-22-28: auxiliary data size: 900000
2023-09-25 13-22-28: 719 MB GPU memory used
2023-09-25 13-22-37: CUDA error. Location: SimulationKernelsLauncher.cu:99 code=719(cudaErrorLaunchFailure) "cudaGetLastError()"
2023-09-25 13-22-37: CUDA error. Location: CudaSimulationFacade.cu:153 code=46(cudaErrorDevicesUnavailable) "cudaGraphicsMapResources(1, &cudaResourceImpl)"
2023-09-25 13-22-37: network: logout
2023-09-25 13-22-37: close simulation
Please only one bug per issue ;) I was able to reproduce the bug and it is fixed now in the latest commit.
Thank you very much for your effort and time.
It still ocours and i finally captured what the logger did not appended, and i think this might be the details you where asking for:
Microsoft Windows [Version 10.0.19045.3448]
(c) Microsoft Corporation. All rights reserved.
d:\Apps\alien2\bin>alien
Not implemented error. File: D:\dev\alien\source\EngineGpuKernels\GenomeDecoder.cuh, Line: 221
Not implemented error. File: D:\dev\alien\source\EngineGpuKernels\GenomeDecoder.cuh, Line: 221
Not implemented error. File: D:\dev\alien\source\EngineGpuKernels\GenomeDecoder.cuh, Line: 221
...(gets repeated a lot of times)...
An uncaught exception occurred: CUDA error. Location: CudaSimulationFacade.cu:153 code=46(cudaErrorDevicesUnavailable) "cudaGraphicsMapResources(1, &cudaResourceImpl)"
The simulator crashes from time to time using the multiplier tool or when i hit (play) start, there is no error message of any kind the app exits immediately, it happens with an empty world or full world, so I assume it has something to do with something deeper in the engine.
What could this be?