Open lithorus opened 3 days ago
I'll have to disagree that this is not fixed by #1590. I did already test with that fix in place.
This is what I get when it tries to dispatch a job :
2024-11-20 22:11:47.945 INFO 16748 --- [pool-1-thread-1] c.i.spcue.dispatcher.CoreUnitDispatcher : Frames found: 1 for host 192.168.31.160 652/10801152 on job testing-test-jimmy_samurai
2024-11-20 22:11:47.961 INFO 16748 --- [pool-1-thread-1] c.i.s.dispatcher.DispatchSupportService : creating proc 192.168.31.160 for 0001-layer1
2024-11-20 22:11:47.978 INFO 16748 --- [pool-1-thread-1] c.i.spcue.dispatcher.CoreUnitDispatcher : dispatchProcToJob failed booking proc 192.168.31.160/39c75ff3-df93-4e25-9203-03b3f91e392f on job testing-test-jimmy_samurai/94baa341-401a-4aaf-bce1-7dab31258b8c
com.imageworks.spcue.dispatcher.DispatcherException: 192.168.31.160 could not be booked on 0001-layer1, java.lang.NullPointerException
at com.imageworks.spcue.dispatcher.DispatchSupportService.runFrame(DispatchSupportService.java:214) ~[main/:na]
at com.imageworks.spcue.dispatcher.DispatchSupportService$$FastClassBySpringCGLIB$$39539eb5.invoke(<generated>) ~[main/:na]
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218) ~[spring-core-5.2.1.RELEASE.jar:5.2.1.RELEASE]
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:769) ~[spring-aop-5.2.1.RELEASE.jar:5.2.1.RELEASE]
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163) ~[spring-aop-5.2.1.RELEASE.jar:5.2.1.RELEASE]
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:747) ~[spring-aop-5.2.1.RELEASE.jar:5.2.1.RELEASE]
at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:366) ~[spring-tx-5.2.1.RELEASE.jar:5.2.1.RELEASE]
at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:99) ~[spring-tx-5.2.1.RELEASE.jar:5.2.1.RELEASE]
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) ~[spring-aop-5.2.1.RELEASE.jar:5.2.1.RELEASE]
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:747) ~[spring-aop-5.2.1.RELEASE.jar:5.2.1.RELEASE]
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:689) ~[spring-aop-5.2.1.RELEASE.jar:5.2.1.RELEASE]
at com.imageworks.spcue.dispatcher.DispatchSupportService$$EnhancerBySpringCGLIB$$c48bb835.runFrame(<generated>) ~[main/:na]
at com.imageworks.spcue.dispatcher.CoreUnitDispatcher.dispatch(CoreUnitDispatcher.java:392) ~[main/:na]
at com.imageworks.spcue.dispatcher.CoreUnitDispatcher$1.wrapDispatchFrame(CoreUnitDispatcher.java:310) ~[main/:na]
at com.imageworks.spcue.dispatcher.CoreUnitDispatcher$DispatchFrameTemplate.execute(CoreUnitDispatcher.java:483) ~[main/:na]
at com.imageworks.spcue.dispatcher.CoreUnitDispatcher.dispatchHost(CoreUnitDispatcher.java:314) ~[main/:na]
at com.imageworks.spcue.dispatcher.CoreUnitDispatcher.dispatchJobs(CoreUnitDispatcher.java:176) ~[main/:na]
at com.imageworks.spcue.dispatcher.CoreUnitDispatcher.dispatchHost(CoreUnitDispatcher.java:235) ~[main/:na]
at com.imageworks.spcue.dispatcher.commands.DispatchBookHost$1.wrapDispatchCommand(DispatchBookHost.java:106) ~[main/:na]
at com.imageworks.spcue.dispatcher.commands.DispatchCommandTemplate.execute(DispatchCommandTemplate.java:36) ~[main/:na]
at com.imageworks.spcue.dispatcher.commands.DispatchBookHost.run(DispatchBookHost.java:117) ~[main/:na]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[na:na]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[na:na]
at java.base/java.lang.Thread.run(Thread.java:829) ~[na:na]
I did a trace and this is what I get :
cuebot/src/main/java/com/imageworks/spcue/dispatcher/DispatchSupportService.java
DispatchSupportService
> runFrame
> rqdClient.launchFrame(prepareRqdRunFrame(proc, frame), proc);
param_1 = {VirtualProc@10068} "192.168.31.160/7c133ad0-91bb-4a96-992e-90f4709bcdfb"
hostId = "fcc88160-7cad-49de-997d-445dda14f1a3"
allocationId = "00000000-0000-0000-0000-000000000000"
frameId = "c84b22e3-bf1b-4ce9-af2f-0a3a205e26a9"
hostName = "192.168.31.160"
os = null
childProcesses = null
canHandleNegativeCoresRequest = true
coresReserved = 100
memoryReserved = 3354624
memoryUsed = 0
memoryMax = 0
virtualMemoryUsed = 0
virtualMemoryMax = 0
gpusReserved = 0
gpuMemoryReserved = 0
gpuMemoryUsed = 0
gpuMemoryMax = 0
unbooked = false
usageRecorded = false
isLocalDispatch = false
layerId = "86bf147f-3709-4398-80ef-d1c0f604a430"
version = 0
showId = "00000000-0000-0000-0000-000000000000"
facilityId = "AAAAAAAA-AAAA-AAAA-AAAA-AAAAAAAAAAA1"
jobId = "94baa341-401a-4aaf-bce1-7dab31258b8c"
id = "7c133ad0-91bb-4a96-992e-90f4709bcdfb"
name = "unknown"
param_2 = {DispatchFrame@10069} "0001-layer1/c84b22e3-bf1b-4ce9-af2f-0a3a205e26a9"
retries = 0
state = {FrameState@10086} "WAITING"
show = "testing"
shot = "test"
owner = "jimmy"
uid = {Optional@10090} "Optional[1000]"
logDir = "/var/tmp//testing/test/logs/testing-test-jimmy_samurai--94baa341-401a-4aaf-bce1-7dab31258b8c"
command = "python3 -c "import os;print(os.path.expanduser('~/test'))""
range = "1-1"
chunkSize = 1
layerName = "layer1"
jobName = "testing-test-jimmy_samurai"
minCores = 100
maxCores = 100
threadable = false
minGpus = 0
maxGpus = 0
minGpuMemory = 0
services = "blender"
os = null
minMemory = 3354624
softMemoryLimit = 3690086
hardMemoryLimit = 4696473
layerId = "86bf147f-3709-4398-80ef-d1c0f604a430"
version = 8
showId = "00000000-0000-0000-0000-000000000000"
facilityId = "AAAAAAAA-AAAA-AAAA-AAAA-AAAAAAAAAAA1"
jobId = "94baa341-401a-4aaf-bce1-7dab31258b8c"
id = "c84b22e3-bf1b-4ce9-af2f-0a3a205e26a9"
name = "0001-layer1"
notice that the os
is null
in each case and later on in the code it expects it not do be null.
It fails in cuebot/src/compiled_protobuf/main/java/com/imageworks/spcue/grpc/rqd/RunFrame.java
:
RunFrame
> Builder
:
/**
* <code>string os = 25;</code>
* @param value The os to set.
* @return This builder for chaining.
*/
public Builder setOs(
java.lang.String value) {
if (value == null) {
throw new NullPointerException();
}
which expects a string or will fail with a NullPointerException
(face palm) I'm sorry, I got this issue confused by another issue fixed by the mentioned PR. I'm reopening this.
Describe the bug If the
os
parameter is not set, cuebot will not dispatch frames from the jobSetting the
str_os
field in the database to non-null value will make it dispatch frames to rqd.