dje-dev / Ceres

Ceres - an MCTS chess engine for research and recreation
GNU General Public License v3.0
153 stars 23 forks source link

Unable to run with 40x512 networks. #70

Open therealkingoftheuniverse opened 2 years ago

therealkingoftheuniverse commented 2 years ago

Hello, When I try to run the engine using a 40x512 network, nibble will remain on "Awaiting readyok from engine" forever. If I try to use the engine in console mode, if I try a go command, the engine will load the network but then there will be no output the console won't let me type another line. Is there a way to fix this? Thanks in advance.

Lc0 with the same network seems to work fine. Ceres with 30x384 also works just fine. I am using RTX 3060 laptop GPU with 6gb vram.

Below is a screenshot of me trying to run it in console mode, its gets stuck at exactly that screen: image

rooklift commented 2 years ago

Me too - I'm able to get some error messages though:

Network evaluation configured to use: <NNEvaluatorDef Network=LC0:C:\Users\Owner\Documents\Misc\Chess\Lc0_Networks\781093 Device=GPU:0 >

Loaded network weights: 0 40x512 WDL MLH  from \Users\Owner\Documents\Misc\Chess\Lc0_Networks\781093

CUDA device 0: NVIDIA GeForce RTX 2060 SMs: 30 Mem: 5gb
Error when initializing CUDA. Did you install NVidia's CUDA? https://developer.nvidia.com/cuda-zone
ErrorInvalidValue
   at ManagedCuda.CudaKernel.set_MaxDynamicSharedSizeBytes(Int32 value)
   at Ceres.Chess.NNBackends.CUDA.ResidualBlockFusedCUDA.LoadKernels() in C:\dev\Ceres\src\Ceres.Chess\NNBackends\CUDA\Layers\Residual\ResidualBlockFusedCUDA.cs:line 73
   at Ceres.Chess.NNBackends.CUDA.ResidualBlockBaseCUDA..ctor(NNBackendExecContext parent, String name, Int32 layerIndex, Int32 c, Int32 h, Int32 w, BaseLayerCUDA inputLayer, Boolean hasSE, Int32 seK, Int32 sharedMemSize) in C:\dev\Ceres\src\Ceres.Chess\NNBackends\CUDA\Layers\Residual\ResidualBlockBaseCUDA.cs:line 32
   at Ceres.Chess.NNBackends.CUDA.ResidualBlockFusedCUDA..ctor(NNBackendExecContext parent, String name, Int32 layerIndex, BaseLayerCUDA inputLayer, Int32 C, Boolean se, Int32 se_k, Boolean first, Boolean last, Int32 sharedMemSize) in C:\dev\Ceres\src\Ceres.Chess\NNBackends\CUDA\Layers\Residual\ResidualBlockFusedCUDA.cs:line 104
   at Ceres.Chess.NNBackends.CUDA.NNBackendCUDALayers.DoBuildNetworkAndLoadWeights(NNBackendExecContext execContext, LC0LegacyWeights weights, Int32 kNumInputPlanes) in C:\dev\Ceres\src\Ceres.Chess\NNBackends\CUDA\NNBackendCUDALayers.cs:line 156
   at Ceres.Chess.NNBackends.CUDA.NNBackendCUDALayers.BuildNetworkAndLoadWeights(NNBackendExecContext execContext, LC0LegacyWeights weights, Int32 kNumInputPlanes) in C:\dev\Ceres\src\Ceres.Chess\NNBackends\CUDA\NNBackendCUDALayers.cs:line 145
   at Ceres.Chess.NNBackends.CUDA.NNBackendLC0_CUDA.InitNetwork(Net net) in C:\dev\Ceres\src\Ceres.Chess\NNBackends\CUDA\NNBackendLC0_CUDA.cs:line 360
   at Ceres.Chess.NNBackends.CUDA.NNBackendLC0_CUDA..ctor(Int32 gpuID, Net net, Boolean saveActivations, Int32 maxBatchSize, Boolean dumpTiming, Boolean enableCUDAGraphs, Int32 graphBatchSizeDivisor, NNBackendLC0_CUDA referenceBackend) in C:\dev\Ceres\src\Ceres.Chess\NNBackends\CUDA\NNBackendLC0_CUDA.cs:line 274
CUDA device 0: NVIDIA GeForce RTX 2060 SMs: 30 Mem: 5gb
Error when initializing CUDA. Did you install NVidia's CUDA? https://developer.nvidia.com/cuda-zone
ErrorInvalidValue
   at ManagedCuda.CudaKernel.set_MaxDynamicSharedSizeBytes(Int32 value)
   at Ceres.Chess.NNBackends.CUDA.ResidualBlockFusedCUDA.LoadKernels() in C:\dev\Ceres\src\Ceres.Chess\NNBackends\CUDA\Layers\Residual\ResidualBlockFusedCUDA.cs:line 73
   at Ceres.Chess.NNBackends.CUDA.ResidualBlockBaseCUDA..ctor(NNBackendExecContext parent, String name, Int32 layerIndex, Int32 c, Int32 h, Int32 w, BaseLayerCUDA inputLayer, Boolean hasSE, Int32 seK, Int32 sharedMemSize) in C:\dev\Ceres\src\Ceres.Chess\NNBackends\CUDA\Layers\Residual\ResidualBlockBaseCUDA.cs:line 32
   at Ceres.Chess.NNBackends.CUDA.ResidualBlockFusedCUDA..ctor(NNBackendExecContext parent, String name, Int32 layerIndex, BaseLayerCUDA inputLayer, Int32 C, Boolean se, Int32 se_k, Boolean first, Boolean last, Int32 sharedMemSize) in C:\dev\Ceres\src\Ceres.Chess\NNBackends\CUDA\Layers\Residual\ResidualBlockFusedCUDA.cs:line 104
   at Ceres.Chess.NNBackends.CUDA.NNBackendCUDALayers.DoBuildNetworkAndLoadWeights(NNBackendExecContext execContext, LC0LegacyWeights weights, Int32 kNumInputPlanes) in C:\dev\Ceres\src\Ceres.Chess\NNBackends\CUDA\NNBackendCUDALayers.cs:line 156
   at Ceres.Chess.NNBackends.CUDA.NNBackendCUDALayers.BuildNetworkAndLoadWeights(NNBackendExecContext execContext, LC0LegacyWeights weights, Int32 kNumInputPlanes) in C:\dev\Ceres\src\Ceres.Chess\NNBackends\CUDA\NNBackendCUDALayers.cs:line 145
   at Ceres.Chess.NNBackends.CUDA.NNBackendLC0_CUDA.InitNetwork(Net net) in C:\dev\Ceres\src\Ceres.Chess\NNBackends\CUDA\NNBackendLC0_CUDA.cs:line 360
   at Ceres.Chess.NNBackends.CUDA.NNBackendLC0_CUDA..ctor(Int32 gpuID, Net net, Boolean saveActivations, Int32 maxBatchSize, Boolean dumpTiming, Boolean enableCUDAGraphs, Int32 graphBatchSizeDivisor, NNBackendLC0_CUDA referenceBackend) in C:\dev\Ceres\src\Ceres.Chess\NNBackends\CUDA\NNBackendLC0_CUDA.cs:line 274
Unhandled exception. System.AggregateException: One or more errors occurred. (Object reference not set to an instance of an object.) (Object reference not set to an instance of an object.)
 ---> System.NullReferenceException: Object reference not set to an instance of an object.
   at Ceres.Chess.NNBackends.CUDA.NNBackendLC0_CUDA.PrepareInputs(Int32 batchSize) in C:\dev\Ceres\src\Ceres.Chess\NNBackends\CUDA\NNBackendLC0_CUDA.cs:line 642
   at Ceres.Chess.NNBackends.CUDA.NNBackendLC0_CUDA.DoEvaluateNN(Int32 batchSize) in C:\dev\Ceres\src\Ceres.Chess\NNBackends\CUDA\NNBackendLC0_CUDA.cs:line 526
   at Ceres.Chess.NNBackends.CUDA.NNBackendLC0_CUDA.EvaluateNN(Int32 batchSize) in C:\dev\Ceres\src\Ceres.Chess\NNBackends\CUDA\NNBackendLC0_CUDA.cs:line 502
   at Ceres.Chess.NNEvaluators.CUDA.NNEvaluatorCUDA.StartEvaluateIntoBuffers(IEncodedPositionBatchFlat positions, Int32 numPositions, Boolean retrieveSupplementalResults) in C:\dev\Ceres\src\Ceres.Chess\NNEvaluators\CUDA\NNEvaluatorCUDA.cs:line 218
   at Ceres.Chess.NNEvaluators.CUDA.NNEvaluatorCUDA.DoEvaluateIntoBuffers(IEncodedPositionBatchFlat positions, Boolean retrieveSupplementalResults) in C:\dev\Ceres\src\Ceres.Chess\NNEvaluators\CUDA\NNEvaluatorCUDA.cs:line 225
   at Ceres.Chess.NNEvaluators.NNEvaluator.EvaluateIntoBuffers(IEncodedPositionBatchFlat positions, Boolean retrieveSupplementalResults) in C:\dev\Ceres\src\Ceres.Chess\NNEvaluators\NNEvaluator.cs:line 149
   at Ceres.MCTS.Params.NNEvaluatorSet.<Warmup>b__18_3() in C:\dev\Ceres\src\Ceres.MCTS\Iteration\Params\NNEvaluatorSet.cs:line 145
   at System.Threading.Tasks.Task.InnerInvoke()
   at System.Threading.Tasks.Task.<>c.<.cctor>b__277_0(Object obj)
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location ---
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)
   --- End of inner exception stack trace ---
   at System.Threading.Tasks.Task.WaitAllCore(Task[] tasks, Int32 millisecondsTimeout, CancellationToken cancellationToken)
   at System.Threading.Tasks.Task.WaitAll(Task[] tasks)
   at System.Threading.Tasks.Parallel.Invoke(ParallelOptions parallelOptions, Action[] actions)
--- End of stack trace from previous location ---
   at System.Threading.Tasks.Parallel.ThrowSingleCancellationExceptionOrOtherException(ICollection exceptions, CancellationToken cancelToken, Exception otherException)
   at System.Threading.Tasks.Parallel.Invoke(ParallelOptions parallelOptions, Action[] actions)
   at Ceres.Features.UCI.UCIManager.InitializeEngineIfNeeded() in C:\dev\Ceres\src\Ceres.Features\UCI\UCIManager.cs:line 546
   at Ceres.Features.UCI.UCIManager.PlayUCI() in C:\dev\Ceres\src\Ceres.Features\UCI\UCIManager.cs:line 296
   at Ceres.Commands.DispatchCommands.ProcessCommand(String cmd) in C:\dev\Ceres\src\Ceres\Commands\DispatchCommands.cs:line 73
   at Ceres.Program.Main(String[] args) in C:\dev\Ceres\src\Ceres\Program.cs:line 103
 ---> (Inner Exception #1) System.NullReferenceException: Object reference not set to an instance of an object.
   at Ceres.Chess.NNBackends.CUDA.NNBackendLC0_CUDA.PrepareInputs(Int32 batchSize) in C:\dev\Ceres\src\Ceres.Chess\NNBackends\CUDA\NNBackendLC0_CUDA.cs:line 642
   at Ceres.Chess.NNBackends.CUDA.NNBackendLC0_CUDA.DoEvaluateNN(Int32 batchSize) in C:\dev\Ceres\src\Ceres.Chess\NNBackends\CUDA\NNBackendLC0_CUDA.cs:line 526
   at Ceres.Chess.NNBackends.CUDA.NNBackendLC0_CUDA.EvaluateNN(Int32 batchSize) in C:\dev\Ceres\src\Ceres.Chess\NNBackends\CUDA\NNBackendLC0_CUDA.cs:line 502
   at Ceres.Chess.NNEvaluators.CUDA.NNEvaluatorCUDA.StartEvaluateIntoBuffers(IEncodedPositionBatchFlat positions, Int32 numPositions, Boolean retrieveSupplementalResults) in C:\dev\Ceres\src\Ceres.Chess\NNEvaluators\CUDA\NNEvaluatorCUDA.cs:line 218
   at Ceres.Chess.NNEvaluators.CUDA.NNEvaluatorCUDA.DoEvaluateIntoBuffers(IEncodedPositionBatchFlat positions, Boolean retrieveSupplementalResults) in C:\dev\Ceres\src\Ceres.Chess\NNEvaluators\CUDA\NNEvaluatorCUDA.cs:line 225
   at Ceres.Chess.NNEvaluators.NNEvaluator.EvaluateIntoBuffers(IEncodedPositionBatchFlat positions, Boolean retrieveSupplementalResults) in C:\dev\Ceres\src\Ceres.Chess\NNEvaluators\NNEvaluator.cs:line 149
   at Ceres.MCTS.Params.NNEvaluatorSet.<Warmup>b__18_4() in C:\dev\Ceres\src\Ceres.MCTS\Iteration\Params\NNEvaluatorSet.cs:line 146
   at System.Threading.Tasks.Task.InnerInvoke()
   at System.Threading.Tasks.Task.<>c.<.cctor>b__277_0(Object obj)
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location ---
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)<---
dje-dev commented 2 years ago

There are two issues here. First, as mentioned in the readme, currently Ceres only supports 40b networks on Ampere cards (e.g. 3080), not on the 20x0 series.

Even with 30x0 cards there can also be problems with 40b networks. We have determined that this depends on the NVIDIA driver; the most recent drivers will fail whereas the older ones (on which Ceres was originally tested) work fine.

Work is underway to understand and hopefully fix this later issue.