RENCI-NRIG / orca5

ORCA5 Software
Eclipse Public License 1.0
2 stars 1 forks source link

Upgrade to Java 8 #140

Closed hinchliff closed 7 years ago

hinchliff commented 7 years ago

Orca needs to at least be able to run using Java 8, even if still compiled by Java 7. But it would be preferable to upgrade the source/compilation to Java 8 as well.

Or maybe even Java 9?

Right now, at least in Docker emulation, Orca will not run using Java 8.

It seems to compile fine using Java 8. But either way, compiled as 7 or compile as 8 it seems to not run using Java 8.

We need to determine if we can get this running under Java 8, and what may be required to upgrade more fully to Java 8. It's possible the old Axis libraries will be problematic in any upgrade.

These are the error messages I see when I try to run using Java 8

2017-06-15 13:10:48,208-[WrapperSimpleAppMain]-{WARN }-orca-(RemoteRegistryCache.java:510)-singleQueryProcess(): Registry returned a known local actor, skipping

@YufengXin was seeing a different error (please comment if incorrect)

INFO | jvm 1 | 2017/06/20 20:04:16 | 2017-06-20 20:04:16,685 [qtp1778861335-36] WARN org.eclipse.jetty.util.log-(Slf4jLog.java:40) - javax.net.ssl.SSLHandshakeException: General SSLEngine problem

hinchliff commented 7 years ago

Looks like we'll have to upgrade at least:

ibaldin commented 7 years ago

I'd need to check the code but it seems the first one isn't really an error. All it says is the remote actor registry returned a record of an actor known locally. I can't remember if that's usually not supposed to happen, but it certainly isn't a fatal error. The second one is more serious. More context on that error is needed, as in which SSL handshake (of the many places where it happens) fails. Is it also with remote actor registry? In that case, perhaps the solution is to upgrade the actor registry as well - it is possible there are cipher mismatches or other issues related to the upgrade to a newer JRE.

ibaldin commented 7 years ago

Remote actor registry can be disabled in favor of using the local config files. If that gets rid of the errors, it would narrow down the space where to look for problems.

ibaldin commented 7 years ago

So looking at the code

a) A warning about known local actor is harmless. All it says is that the local actor already known to registry and happened to be returned as a result of a query. Local registry cache is populated regardless of whether remote registry is configured. It simply contains only local actors in that case.

b) To disable actor registry make sure registry.url is commented out in container.properties (or orca.properties - can't remember which one). That will prevent actors from registering and querying the registry. So if SSL errors are about talking to the registry - that should take care of it.

All of this is under RemoteRegistryCache.java - there is a static method registerWithRegistry(IActor) that is called when the actor is initialized. And the rest of the methods are used when combining the actor topology from local config file with results of the registry.

hinchliff commented 7 years ago

OK, i'm still trying to look into this, but there is something not working with the pools. They seem to get created, but they are empty? I had thought this might be because of the Actor Registry warnings, but you're right that it seems like the Actors / pools are there, they are just empty.

There aren't really any errors printed out until Recovery starts (looking at the AM):

2017-07-13 13:15:25,975-[WrapperSimpleAppMain]-{INFO }-orca.bbn-vm-am-(Actor.java:715)-Recovering slice: EE40A77E
2017-07-13 13:15:25,975-[WrapperSimpleAppMain]-{DEBUG}-orca.bbn-vm-am-(Actor.java:717)-Instantiating slice object
2017-07-13 13:15:25,975-[WrapperSimpleAppMain]-{DEBUG}-orca.bbn-vm-am-(Actor.java:720)-Recoverying slice object
2017-07-13 13:15:25,976-[WrapperSimpleAppMain]-{DEBUG}-orca.bbn-vm-am-(Actor.java:723)-Informing the plugin about the slice
2017-07-13 13:15:25,979-[WrapperSimpleAppMain]-{DEBUG}-orca.bbn-vm-am-(Actor.java:726)-Registering slice: EE40A77E
2017-07-13 13:15:25,980-[WrapperSimpleAppMain]-{DEBUG}-orca.bbn-vm-am-(Actor.java:729)-Recovering reservations in slice: EE40A77E
2017-07-13 13:15:25,980-[WrapperSimpleAppMain]-{INFO }-orca.bbn-vm-am-(Actor.java:737)-Starting to recover reservations in slice xCAT node(EE40A77E)
2017-07-13 13:15:25,981-[WrapperSimpleAppMain]-{DEBUG}-orca.bbn-vm-am-(Actor.java:748)-There are 1 reservation(s) in slice xCAT node(EE40A77E)
2017-07-13 13:15:25,983-[WrapperSimpleAppMain]-{INFO }-orca.bbn-vm-am-(Actor.java:770)-Found reservation #86409247 in state: [Ticketed, None, NoJoin]
2017-07-13 13:15:25,983-[WrapperSimpleAppMain]-{INFO }-orca.bbn-vm-am-(Actor.java:779)-Recovering reservation #86409247
2017-07-13 13:15:25,984-[WrapperSimpleAppMain]-{DEBUG}-orca.bbn-vm-am-(Actor.java:781)-Recovering reservation object
2017-07-13 13:15:26,000-[WrapperSimpleAppMain]-{DEBUG}-orca.bbn-vm-am-(ReservationClient.java:2547)-No recovery necessary for reservation #86409247
2017-07-13 13:15:26,000-[WrapperSimpleAppMain]-{DEBUG}-orca.bbn-vm-am-(Actor.java:784)-Registering the reservation with the actor
2017-07-13 13:15:26,001-[WrapperSimpleAppMain]-{DEBUG}-orca.bbn-vm-am-(Actor.java:787)-Revisiting with the ShirakoPlugin
2017-07-13 13:15:26,001-[WrapperSimpleAppMain]-{DEBUG}-orca.bbn-vm-am-(Actor.java:790)-Revisiting with the actor policy
2017-07-13 13:15:26,002-[WrapperSimpleAppMain]-{ERROR}-orca.bbn-vm-am-(Actor.java:754)-Unexpected error while recoverying reservation
orca.util.OrcaException: Could not recover Reservation #86409247 due to null
        at orca.shirako.core.Actor.recoverReservation(Actor.java:798)
        at orca.shirako.core.Actor.recoverReservations(Actor.java:752)
        at orca.shirako.core.Actor.recoverSlice(Actor.java:730)
        at orca.shirako.core.Actor.recoverSlices(Actor.java:699)
        at orca.shirako.core.Actor.recover(Actor.java:671)
        at orca.boot.ConfigurationProcessor.recoverActors(ConfigurationProcessor.java:259)
        at orca.boot.ConfigurationProcessor.process(ConfigurationProcessor.java:143)
        at orca.boot.ConfigurationLoader.process(ConfigurationLoader.java:76)
        at orca.shirako.container.OrcaContainer.loadConfiguration(OrcaContainer.java:783)
        at orca.shirako.container.OrcaContainer.loadConfiguration(OrcaContainer.java:752)
        at orca.shirako.container.OrcaContainer.initialize(OrcaContainer.java:257)
        at orca.shirako.container.Globals.start(Globals.java:122)
        at orca.server.OrcaServer.startOrca(OrcaServer.java:115)
        at orca.server.OrcaServer.start(OrcaServer.java:222)
        at orca.server.OrcaServer.main(OrcaServer.java:273)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.tanukisoftware.wrapper.WrapperSimpleApp.run(WrapperSimpleApp.java:240)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
        at orca.shirako.common.delegation.ResourceTicket.getDelegation(ResourceTicket.java:122)
        at orca.shirako.common.delegation.ResourceTicket.getUnits(ResourceTicket.java:178)
        at orca.shirako.core.Ticket.getUnits(Ticket.java:302)
        at orca.shirako.kernel.ResourceSet.toString(ResourceSet.java:1018)
        at orca.shirako.core.AuthorityPolicy.donate(AuthorityPolicy.java:99)
        at orca.policy.core.AuthorityCalendarPolicy.donate(AuthorityCalendarPolicy.java:239)
        at orca.shirako.core.AuthorityPolicy.revisit(AuthorityPolicy.java:87)
        at orca.policy.core.AuthorityCalendarPolicy.revisit(AuthorityCalendarPolicy.java:204)
        at orca.shirako.core.Actor.recoverReservation(Actor.java:791)
        ... 20 more
2017-07-13 13:15:26,003-[WrapperSimpleAppMain]-{INFO }-orca.bbn-vm-am-(Actor.java:758)-Recovery for reservations in slice xCAT node(EE40A77E) complete
2017-07-13 13:15:26,003-[WrapperSimpleAppMain]-{INFO }-orca.bbn-vm-am-(Actor.java:731)-Recovery of slice EE40A77E complete

The actor registry warnings follow the above xCAT recovery errors:

2017-07-13 13:15:27,945-[WrapperSimpleAppMain]-{DEBUG}-orca-(ConfigurationProcessor.java:277)-Processing entries in registry cache after topology
2017-07-13 13:15:27,946-[WrapperSimpleAppMain]-{DEBUG}-orca-(RemoteRegistryCache.java:495)-Processing XMLRPC registry response
2017-07-13 13:15:27,950-[WrapperSimpleAppMain]-{WARN }-orca-(RemoteRegistryCache.java:510)-singleQueryProcess(): Registry returned a known local actor, skipping
[...]
2017-07-13 13:15:27,955-[WrapperSimpleAppMain]-{INFO }-orca.boot.ConfigurationProcessor-(ConfigurationProcessor.java:149)- +++++ Processing exports with actor count 0

And later

2017-07-13 13:15:27,967-[bbn-vm-am]-{INFO }-orca.bbn-vm-am-(ActorDatabase.java:335)-Adding reservation BDCA9D61 to slice test-ndl-broker(AF299488)
2017-07-13 13:15:27,984-[bbn-vm-am]-{ERROR}-orca.bbn-vm-am-(Policy.java:113)-bindTicket: no tickets available for the requested resource type bbnvmsite.vm
2017-07-13 13:15:27,984-[bbn-vm-am]-{ERROR}-orca.bbn-vm-am-(BrokerReservation.java:645)-mapAndUpdate bindTicket failed for ticketRequest:
orca.util.OrcaException: bindTicket: no tickets available for the requested resource type bbnvmsite.vm
        at orca.shirako.core.Policy.error(Policy.java:114)
        at orca.shirako.core.AuthorityPolicy.bind(AuthorityPolicy.java:170)
        at orca.shirako.kernel.BrokerReservation.mapAndUpdate(BrokerReservation.java:636)
        at orca.shirako.kernel.BrokerReservation.reserve(BrokerReservation.java:295)
        at orca.shirako.kernel.Kernel.reserve(Kernel.java:946)
        at orca.shirako.kernel.KernelWrapper.handleReserve(KernelWrapper.java:795)
        at orca.shirako.kernel.KernelWrapper.export(KernelWrapper.java:289)
        at orca.shirako.core.Authority.export(Authority.java:198)
        at orca.manage.internal.ServerActorManagementObject$7.run(ServerActorManagementObject.java:709)
        at orca.shirako.core.Actor$3.process(Actor.java:977)
        at orca.shirako.core.Actor.actorMain(Actor.java:377)
        at orca.shirako.core.Actor$4.run(Actor.java:1018)
        at java.lang.Thread.run(Thread.java:748)
2017-07-13 13:15:27,985-[bbn-vm-am]-{DEBUG}-orca.bbn-vm-am-(Reservation.java:1228)-failed

But I still haven't really seen any apparent errors from before recovery, when setting up the actor pools...

hinchliff commented 7 years ago

Are the resources supposed to be in these pool properties?

2017-07-13 13:14:51,815-[WrapperSimpleAppMain]-{DEBUG}-orca-(NdlResourcePoolFactory.java:169)-NdlResourcePoolFactory: No pool properties extracted from ndl for resource type: bbnvmsite.baremetalce
[...]
2017-07-13 13:14:52,803-[WrapperSimpleAppMain]-{DEBUG}-orca-(NdlResourcePoolFactory.java:174)-NdlResourcePoolFactory: Pool properties extracted from NDL: {lun.tag.end1=116, lun.tag.start1=1}
[...]
2017-07-13 13:14:53,331-[WrapperSimpleAppMain]-{DEBUG}-orca-(NdlResourcePoolFactory.java:169)-NdlResourcePoolFactory: No pool properties extracted from ndl for resource type: bbnvmsite.vm
[...]
2017-07-13 13:14:53,879-[WrapperSimpleAppMain]-{DEBUG}-orca-(NdlResourcePoolFactory.java:174)-NdlResourcePoolFactory: Pool properties extracted from NDL: {vlan.tag.end1=1001, vlan.range.num=1, vlan.tag.start1=2}

For some reason, the AM is finding the VLAN and LUN resources, but not the baremetal or VM resources? (That pattern continues over all of the actor pools)

ibaldin commented 7 years ago

I believe so - the resources should be there

ibaldin commented 7 years ago

I would suggest not to do recovery. Try starting clean.

hinchliff commented 7 years ago

To be honest I'm not sure why recovery is happening, or what is being recovered.

The state_recovery.lock file is not present in the container before Orca is started, and the logs confirm this:

2017-07-13 19:48:49,349-[WrapperSimpleAppMain]-{DEBUG}-orca-(OrcaContainer.java:296)-Checking if this container is recovering. Looking for: /etc/orca/am+broker-12080/state_recovery.lock
2017-07-13 19:48:49,349-[WrapperSimpleAppMain]-{DEBUG}-orca-(OrcaContainer.java:301)-Superblock file does not exist. This is a fresh container

But still all of the above example reservations are recovered.

ibaldin commented 7 years ago

What do you mean by 'example reservations'?

hinchliff commented 7 years ago

Yeah, example wasn't a very good word. Reservations previously shown in comments. There are a bunch (10?) that get recovered, but I don't know where they come from.

Are they in the substrate files themselves?

2017-07-13 19:49:57,079-[WrapperSimpleAppMain]-{INFO }-orca.wvn-net-am-(ActorDatabase.java:335)-Adding reservation D70EBCB2 to slice WVN NET VLAN(8B30B472)
2017-07-13 19:49:57,098-[WrapperSimpleAppMain]-{INFO }-orca.test-ndl-broker-(Actor.java:664)-Starting recovery
ibaldin commented 7 years ago

THere definitely will be reservations showing what the AMs have (in the pools).

ibaldin commented 7 years ago

I forgot to mention - we should update Jena as well. I don't know to what extent this is held back by Gleen. Perhaps we're on a recent enogh version that it doesn't matter (or rather we can wait to update Jena)

hinchliff commented 7 years ago

The pool properties behavior exhibited above, where only the LUN and VLANs had any pool properties but not the VM or baremetal, must be the expected behavior.

The code in NdlResourcePoolFactory calls ndlPropExtractor.getPoolProperties(). For the VM, the NDLPoolPropertyExtractor class used is VMPolicyNDLPoolPropertyExtractor, which doesn't implement the getPoolProperties() function, relying on the parent function from GenericNDLPoolPropertyExtractor that simply returns null.

The sublcasses for LUN and VLANs do implement the getPoolProperties() function.

So apparently that is not the problem.

hinchliff commented 7 years ago

Finally started directly comparing Java 7 logs to Java 8 logs. (With some extra, uncommitted logging statements.)

Java 7:

2017-07-25 19:17:57,190-[WrapperSimpleAppMain]-{INFO }-orca.boot.ConfigurationProcessor-(ConfigurationProcessor.java:144)- +++++ Processing exports with actor count 0
2017-07-25 19:17:57,191-[WrapperSimpleAppMain]-{INFO }-orca.boot.ConfigurationProcessor-(ConfigurationProcessor.java:787)-Exporting resources from test-nerscvmsite-site to test-ndl-broker
2017-07-25 19:17:57,191-[WrapperSimpleAppMain]-{DEBUG}-orca.boot.ConfigurationProcessor-(ConfigurationProcessor.java:807)-Using Server Actor LocalAuthority to exportResources
2017-07-25 19:17:57,191-[WrapperSimpleAppMain]-{DEBUG}-orca.test-nerscvmsite-site-(ServerActorManagementObject.java:706)-Executing export on actor test-nerscvmsite-site test-nerscvmsite-site(Authority) null
2017-07-25 19:17:57,191-[test-nerscvmsite-site]-{DEBUG}-orca.test-nerscvmsite-site-(Actor.java:374)-Processing 1 events
2017-07-25 19:17:57,200-[test-nerscvmsite-site]-{INFO }-orca.test-nerscvmsite-site-(ActorDatabase.java:335)-Adding reservation AF15EB8F to slice test-ndl-broker(A3526B19)
2017-07-25 19:17:57,207-[test-nerscvmsite-site]-{DEBUG}-orca.test-nerscvmsite-site-(BrokerReservation.java:624)-Using policy AuthorityCalendarPolicy to bind reservation
2017-07-25 19:17:57,210-[test-nerscvmsite-site]-{DEBUG}-orca.test-nerscvmsite-site-(AuthorityPolicy.java:335)-extracting delegation: delegation=[guid=9b610ff2-4dee-429e-9e98-c19f92638aec,units=12,vector=null,type=nerscvmsite.vm,issuer=0bb0eee9-2f96-4e38-baa7-61c9d90fee9b,holder=7b60d8dd-5c4d-4b2c-afc9-bfd6023e66cd]
2017-07-25 19:17:57,210-[test-nerscvmsite-site]-{DEBUG}-orca.util.KeystoreManager-(KeystoreManager.java:107)-/etc/orca/am+broker-12080//config/runtime/keystores/0bb0eee9-2f96-4e38-baa7-61c9d90fee9b.jks requesting certificate alias=7b60d8dd-5c4d-4b2c-afc9-bfd6023e66cd
2017-07-25 19:17:57,211-[test-nerscvmsite-site]-{DEBUG}-orca.test-nerscvmsite-site-(Reservation.java:1232)-Reservation #AF15EB8F ticket request transition: Nascent->Nascent, None->Ticketing
2017-07-25 19:17:57,214-[test-nerscvmsite-site]-{DEBUG}-orca.test-nerscvmsite-site-(Reservation.java:1232)-Reservation #AF15EB8F ticketed transition: Nascent->Ticketed, Ticketing->Priming
2017-07-25 19:17:57,218-[test-nerscvmsite-site]-{INFO }-orca.test-nerscvmsite-site-(ActorDatabase.java:378)-Updating reservation AF15EB8F in slice test-ndl-broker(A3526B19)
2017-07-25 19:17:57,224-[test-nerscvmsite-site]-{DEBUG}-orca.test-nerscvmsite-site-(ActorDatabase.java:406)-No need to update reservation AF15EB8F in slice test-ndl-broker(A3526B19)
2017-07-25 19:17:57,226-[test-nerscvmsite-site]-{DEBUG}-orca.test-nerscvmsite-site-(Reservation.java:1232)-Reservation #AF15EB8F update absorbed transition: Ticketed->Ticketed, Priming->None
2017-07-25 19:17:57,226-[test-nerscvmsite-site]-{DEBUG}-orca.test-nerscvmsite-site-(BrokerReservation.java:573)-Generating update
2017-07-25 19:17:57,226-[test-nerscvmsite-site]-{WARN }-orca.test-nerscvmsite-site-(BrokerReservation.java:575)-Cannot generate update: no callback.

Java 8:

2017-07-25 18:27:04,481-[WrapperSimpleAppMain]-{INFO }-orca.boot.ConfigurationProcessor-(ConfigurationProcessor.java:144)- +++++ Processing exports with actor count 0
2017-07-25 18:27:04,482-[WrapperSimpleAppMain]-{INFO }-orca.boot.ConfigurationProcessor-(ConfigurationProcessor.java:787)-Exporting resources from rci-vm-am to test-ndl-broker
2017-07-25 18:27:04,482-[WrapperSimpleAppMain]-{DEBUG}-orca.boot.ConfigurationProcessor-(ConfigurationProcessor.java:807)-Using Server Actor LocalAuthority to exportResources
2017-07-25 18:27:04,482-[WrapperSimpleAppMain]-{DEBUG}-orca.rci-vm-am-(ServerActorManagementObject.java:706)-Executing export on actor rci-vm-am rci-vm-am(Authority) null
2017-07-25 18:27:04,483-[rci-vm-am]-{DEBUG}-orca.rci-vm-am-(Actor.java:374)-Processing 1 events
2017-07-25 18:27:04,495-[rci-vm-am]-{INFO }-orca.rci-vm-am-(ActorDatabase.java:335)-Adding reservation 3C94D898 to slice test-ndl-broker(F6213671)
2017-07-25 18:27:04,505-[rci-vm-am]-{DEBUG}-orca.rci-vm-am-(BrokerReservation.java:623)-Using policy AuthorityCalendarPolicy to bind reservation
2017-07-25 18:27:04,505-[rci-vm-am]-{ERROR}-orca.rci-vm-am-(Policy.java:113)-bindTicket: no tickets available for the requested resource type rcivmsite.vm
2017-07-25 18:27:04,505-[rci-vm-am]-{ERROR}-orca.rci-vm-am-(BrokerReservation.java:649)-mapAndUpdate bindTicket failed for ticketRequest:
orca.util.OrcaException: bindTicket: no tickets available for the requested resource type rcivmsite.vm
    at orca.shirako.core.Policy.error(Policy.java:114)
    at orca.shirako.core.AuthorityPolicy.bind(AuthorityPolicy.java:170)
    at orca.shirako.kernel.BrokerReservation.mapAndUpdate(BrokerReservation.java:640)
    at orca.shirako.kernel.BrokerReservation.reserve(BrokerReservation.java:295)
    at orca.shirako.kernel.Kernel.reserve(Kernel.java:946)
    at orca.shirako.kernel.KernelWrapper.handleReserve(KernelWrapper.java:795)
    at orca.shirako.kernel.KernelWrapper.export(KernelWrapper.java:289)
    at orca.shirako.core.Authority.export(Authority.java:198)
    at orca.manage.internal.ServerActorManagementObject$7.run(ServerActorManagementObject.java:712)
    at orca.shirako.core.Actor$3.process(Actor.java:977)
    at orca.shirako.core.Actor.actorMain(Actor.java:377)
    at orca.shirako.core.Actor$4.run(Actor.java:1018)
    at java.lang.Thread.run(Thread.java:748)
2017-07-25 18:27:04,505-[rci-vm-am]-{DEBUG}-orca.rci-vm-am-(Reservation.java:1228)-failed
2017-07-25 18:27:04,505-[rci-vm-am]-{DEBUG}-orca.rci-vm-am-(Reservation.java:1232)-Reservation #3C94D898 orca.util.OrcaException: bindTicket: no tickets available for the requested resource type rcivmsite.vm transition: Nascent->Failed, None->None
2017-07-25 18:27:04,506-[rci-vm-am]-{WARN }-orca.rci-vm-am-(Reservation.java:918)-reservation #3C94D898: reservation has failed: orca.util.OrcaException: bindTicket: no tickets available for the requested resource type rcivmsite.vm: [res: #7e83186c-fd35-4990-9061-77d8989eee69 slice: test-ndl-broker Failed None ]
2017-07-25 18:27:04,506-[rci-vm-am]-{DEBUG}-orca.rci-vm-am-(BrokerReservation.java:572)-Generating update
2017-07-25 18:27:04,507-[rci-vm-am]-{WARN }-orca.rci-vm-am-(BrokerReservation.java:574)-Cannot generate update: no callback
2017-07-25 18:27:04,507-[rci-vm-am]-{INFO }-orca.rci-vm-am-(ActorDatabase.java:378)-Updating reservation 3C94D898 in slice test-ndl-broker(F6213671)
2017-07-25 18:27:04,514-[rci-vm-am]-{DEBUG}-orca.rci-vm-am-(ActorDatabase.java:406)-No need to update reservation 3C94D898 in slice test-ndl-broker(F6213671)

Still need to figure out why running in Java 8 results in no tickets.

hinchliff commented 7 years ago

Actually, no tickets seems to be directly related to the NullPointerException i had pointed out earlier.

Java 7 (note the donateTicket line) :

2017-07-25 19:17:54,597-[WrapperSimpleAppMain]-{INFO }-orca.test-nerscvmsite-site-(Actor.java:737)-Starting to recover reservations in slice NERSC Cluster(E1F3E8DB)
2017-07-25 19:17:54,598-[WrapperSimpleAppMain]-{DEBUG}-orca.test-nerscvmsite-site-(Actor.java:748)-There are 1 reservation(s) in slice NERSC Cluster(E1F3E8DB)
2017-07-25 19:17:54,600-[WrapperSimpleAppMain]-{INFO }-orca.test-nerscvmsite-site-(Actor.java:770)-Found reservation #F44ECDB5 in state: [Ticketed, None, NoJoin]
2017-07-25 19:17:54,600-[WrapperSimpleAppMain]-{INFO }-orca.test-nerscvmsite-site-(Actor.java:779)-Recovering reservation #F44ECDB5
2017-07-25 19:17:54,600-[WrapperSimpleAppMain]-{DEBUG}-orca.test-nerscvmsite-site-(Actor.java:781)-Recovering reservation object
2017-07-25 19:17:54,605-[WrapperSimpleAppMain]-{DEBUG}-orca.test-nerscvmsite-site-(ReservationClient.java:2547)-No recovery necessary for reservation #F44ECDB5
2017-07-25 19:17:54,605-[WrapperSimpleAppMain]-{DEBUG}-orca.test-nerscvmsite-site-(Actor.java:784)-Registering the reservation with the actor
2017-07-25 19:17:54,606-[WrapperSimpleAppMain]-{DEBUG}-orca.test-nerscvmsite-site-(Actor.java:787)-Revisiting with the ShirakoPlugin
2017-07-25 19:17:54,606-[WrapperSimpleAppMain]-{DEBUG}-orca.test-nerscvmsite-site-(Actor.java:790)-Revisiting with the actor policy
2017-07-25 19:17:54,606-[WrapperSimpleAppMain]-{INFO }-orca.test-nerscvmsite-site-(AuthorityPolicy.java:99)-AuthorityPolicy donateTicket rset: units 12  concrete 12 Ticket [units = 12 oldUnits = 0] 
2017-07-25 19:17:54,607-[WrapperSimpleAppMain]-{INFO }-orca.test-nerscvmsite-site-(Actor.java:793)-Recovered reservation #F44ECDB5
2017-07-25 19:17:54,607-[WrapperSimpleAppMain]-{INFO }-orca.test-nerscvmsite-site-(Actor.java:758)-Recovery for reservations in slice NERSC Cluster(E1F3E8DB) complete

Java 8:

2017-07-25 18:27:04,381-[WrapperSimpleAppMain]-{INFO }-orca.rci-vm-am-(Actor.java:737)-Starting to recover reservations in slice OpenStack Virtual Machine (RCI)(5C35251)
2017-07-25 18:27:04,388-[WrapperSimpleAppMain]-{DEBUG}-orca.rci-vm-am-(Actor.java:748)-There are 1 reservation(s) in slice OpenStack Virtual Machine (RCI)(5C35251)
2017-07-25 18:27:04,390-[WrapperSimpleAppMain]-{INFO }-orca.rci-vm-am-(Actor.java:770)-Found reservation #7828D9B9 in state: [Ticketed, None, NoJoin]
2017-07-25 18:27:04,390-[WrapperSimpleAppMain]-{INFO }-orca.rci-vm-am-(Actor.java:779)-Recovering reservation #7828D9B9
2017-07-25 18:27:04,390-[WrapperSimpleAppMain]-{DEBUG}-orca.rci-vm-am-(Actor.java:781)-Recovering reservation object
2017-07-25 18:27:04,394-[WrapperSimpleAppMain]-{DEBUG}-orca.rci-vm-am-(ReservationClient.java:2547)-No recovery necessary for reservation #7828D9B9
2017-07-25 18:27:04,394-[WrapperSimpleAppMain]-{DEBUG}-orca.rci-vm-am-(Actor.java:784)-Registering the reservation with the actor
2017-07-25 18:27:04,396-[WrapperSimpleAppMain]-{DEBUG}-orca.rci-vm-am-(Actor.java:787)-Revisiting with the ShirakoPlugin
2017-07-25 18:27:04,396-[WrapperSimpleAppMain]-{DEBUG}-orca.rci-vm-am-(Actor.java:790)-Revisiting with the actor policy
2017-07-25 18:27:04,396-[WrapperSimpleAppMain]-{ERROR}-orca.rci-vm-am-(Actor.java:754)-Unexpected error while recoverying reservation
orca.util.OrcaException: Could not recover Reservation #7828D9B9 due to null
    at orca.shirako.core.Actor.recoverReservation(Actor.java:798)
    at orca.shirako.core.Actor.recoverReservations(Actor.java:752)
    at orca.shirako.core.Actor.recoverSlice(Actor.java:730)
    at orca.shirako.core.Actor.recoverSlices(Actor.java:699)
    at orca.shirako.core.Actor.recover(Actor.java:671)
    at orca.boot.ConfigurationProcessor.recoverActors(ConfigurationProcessor.java:254)
    at orca.boot.ConfigurationProcessor.process(ConfigurationProcessor.java:138)
    at orca.boot.ConfigurationLoader.process(ConfigurationLoader.java:76)
    at orca.shirako.container.OrcaContainer.loadConfiguration(OrcaContainer.java:783)
    at orca.shirako.container.OrcaContainer.loadConfiguration(OrcaContainer.java:752)
    at orca.shirako.container.OrcaContainer.initialize(OrcaContainer.java:257)
    at orca.shirako.container.Globals.start(Globals.java:122)
    at orca.server.OrcaServer.startOrca(OrcaServer.java:115)
    at orca.server.OrcaServer.start(OrcaServer.java:222)
    at orca.server.OrcaServer.main(OrcaServer.java:273)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.tanukisoftware.wrapper.WrapperSimpleApp.run(WrapperSimpleApp.java:240)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
    at orca.shirako.common.delegation.ResourceTicket.getDelegation(ResourceTicket.java:122)
    at orca.shirako.common.delegation.ResourceTicket.getUnits(ResourceTicket.java:178)
    at orca.shirako.core.Ticket.getUnits(Ticket.java:302)
    at orca.shirako.kernel.ResourceSet.toString(ResourceSet.java:1018)
    at orca.shirako.core.AuthorityPolicy.donate(AuthorityPolicy.java:99)
    at orca.policy.core.AuthorityCalendarPolicy.donate(AuthorityCalendarPolicy.java:239)
    at orca.shirako.core.AuthorityPolicy.revisit(AuthorityPolicy.java:87)
    at orca.policy.core.AuthorityCalendarPolicy.revisit(AuthorityCalendarPolicy.java:204)
    at orca.shirako.core.Actor.recoverReservation(Actor.java:791)
    ... 20 more
2017-07-25 18:27:04,396-[WrapperSimpleAppMain]-{INFO }-orca.rci-vm-am-(Actor.java:758)-Recovery for reservations in slice OpenStack Virtual Machine (RCI)(5C35251) complete

I think it is delegations that is null here. Will have to track that next.

hinchliff commented 7 years ago

In java 8, the delegations are null at this point in recovery, because the reservation properties read from the "database" do not include delegations. (Before recovery recovers the delegation field, the delegations are just zero. But the zeros are replaced with the null from the properties during recovery.)

In java 7, the delegation properties include the correct values.

I'm not sure yet if the problem with the properties is in reading from the database, or from when the properties get written to the database (I'm not sure when that occurs).

This line in Actor.java is where the properties are read from the "database".

ibaldin commented 7 years ago

I would trace the boot package actions

hinchliff commented 7 years ago

Right, I think these pool reservations are added to the database at this line of PoolCreator.java

I added some extra debug statements, and it looks like the delegations portion of the reservation is not null nor zero at this point (the Ticket [units = 132 part):

2017-07-26 19:40:04,491-[WrapperSimpleAppMain]-{DEBUG}-orca-(PoolCreator.java:85)-Adding source reservation to database res: F4BD38DF slice: OpenStack Virtual Machine (RCI) Ticketed None rset: units 132  concrete 132 Ticket [units = 132 oldUnits = 0]  term=[0:0:426399601]
hinchliff commented 7 years ago

I confirmed that the mysql database does not have the delegation information (when using Java 8).

I added some extra debugging statements to the addReservation() function of ActorDatabase.

The delegations information is missing from the Properties created from PersistenceUtils.save(reservation) -- missing when using Java 8, present when using Java 7.

In Java 7, these properties include the signedDelegation in the ticket:

2017-07-27 16:22:54,639-[WrapperSimpleAppMain]-{DEBUG}-orca.test-nerscvmsite-site-(ActorDatabase.java:344)-Reservation has persistence properties: [orca.shirako.kernel.ReservationClient.joinPredecessors=__class_name__=java.util.HashMap
count=0
, ReservationExtended=false, ReservationCategory=1, ReservationClientLastTicketUpdate=__class_name__=orca.shirako.util.UpdateData
UpdateDataFailed=false
, ReservationID.__class_name__=orca.shirako.common.ReservationID, ReservationID=1d13eb62-af3f-42da-a87c-f4cc3263e079, ReservationClientRenewTime=0, ReservationClientSequenceLeaseOut=0, ReservationApprovedTerm=TermEndTime=1927497600000
TermStartTime=1264809600000
TermNewStartTime=1264809600000
__class_name__=orca.shirako.time.Term
, orca.shirako.kernel.ReservationClient.slice-reference=__class_name__=orca.shirako.common.SliceID
value=ca6605e3-506a-48d9-9cf2-2b5365eab9f4
value.__class_name__=orca.shirako.common.SliceID
, ReservationClientSequenceLeaseIn=0, ReservationClientLastLeaseUpdate=__class_name__=orca.shirako.util.UpdateData
UpdateDataFailed=false
, ReservationSliceID=ca6605e3-506a-48d9-9cf2-2b5365eab9f4, ReservationSliceName=NERSC Cluster, ReservationRenewable=false, __class_name__=orca.shirako.kernel.ReservationClient, ReservationPending=1, ReservationSliceID.__class_name__=orca.shirako.common.SliceID, ReservationResources=ResourceSetType.__class_name__=orca.util.ResourceType
ResourceSetConcreteSet=TicketStubOldUnits\=0\n__class_name__\=orca.shirako.core.Ticket\nTicketResourceTicket\=<sharpTicket>\\n  <delegations>\\n    <signedDelegation>\\n      <guid>66346aa4-a3fb-48e8-8e5f-f985ef6c52c7</guid>\\n      <term>\\n        <startTime>2010-01-30 00\\\:00\\\:00.0 UTC</startTime>\\n        <endTime>2031-01-30 00\\\:00\\\:00.0 UTC</endTime>\\n        <newStartTime>2010-01-30 00\\\:00\\\:00.0 UTC</newStartTime>\\n      </term>\\n      <units>12</units>\\n      <type type\\\="nerscvmsite.vm"/>
[...]

While in Java 8, the ticket skips the delegation part

2017-07-27 16:37:00,922-[WrapperSimpleAppMain]-{DEBUG}-orca.rci-vm-am-(ActorDatabase.java:344)-Reservation has persistence properties: [orca.shirako.kernel.ReservationClient.joinPredecessors=__class_name__=java.util.HashMap
count=0
, ReservationExtended=false, ReservationCategory=1, ReservationClientLastTicketUpdate=__class_name__=orca.shirako.util.UpdateData
UpdateDataFailed=false
, ReservationID.__class_name__=orca.shirako.common.ReservationID, ReservationID=6fcc7cce-02f0-4571-b993-91ec23cd2124, ReservationClientRenewTime=0, ReservationClientSequenceLeaseOut=0, ReservationApprovedTerm=TermEndTime=1927497600000
TermStartTime=1264809600000
TermNewStartTime=1264809600000
__class_name__=orca.shirako.time.Term
, orca.shirako.kernel.ReservationClient.slice-reference=__class_name__=orca.shirako.common.SliceID
value=1cc89327-df69-4776-8589-468c2ce83d8c
value.__class_name__=orca.shirako.common.SliceID
, ReservationClientSequenceLeaseIn=0, ReservationClientLastLeaseUpdate=__class_name__=orca.shirako.util.UpdateData
UpdateDataFailed=false
, ReservationSliceID=1cc89327-df69-4776-8589-468c2ce83d8c, ReservationSliceName=OpenStack Virtual Machine (RCI), ReservationRenewable=false, __class_name__=orca.shirako.kernel.ReservationClient, ReservationPending=1, ReservationSliceID.__class_name__=orca.shirako.common.SliceID, ReservationResources=ResourceSetType.__class_name__=orca.util.ResourceType
ResourceSetConcreteSet=TicketStubOldUnits\=0\n__class_name__\=orca.shirako.core.Ticket\nTicketResourceTicket\=<sharpTicket>\\n  <certificate>
[...]

I think that means we're narrowing down the problem. Not sure what it is yet though.

hinchliff commented 7 years ago

Finally found it. Needed to update the xtream library used in SimpleResourceTicketFactory

mcevik0 commented 7 years ago

Java 8 update 144 deployed on all racks during RENCI-NRIG/exogeni#164