SolaceProducts / pubsubplus-kubernetes-helm-quickstart

Quickstart to launch a Solace PubSub+ Software Event Broker in Kubernetes using Helm
Apache License 2.0
32 stars 44 forks source link

Documentation for running solace in a persistent mode does include an example setup for the PV #68

Closed davesargrad closed 4 years ago

davesargrad commented 5 years ago

I am trying to follow the instructions to configure solace to use persistent storage.

My pod is up and running however the PVC is pending since a PV is not created.

image

In values.yaml I set the following (per the documentation in the solace-kubernetes-quickstart): storage: persistent: true size: 30Gi

What is the easiest way to create a PV that allows this PVC to move from the PENDING state?

davesargrad commented 5 years ago

The PVC resource configuration `kind: PersistentVolumeClaim apiVersion: v1 metadata: name: data-idolized-rabbit-solace-0 namespace: default selfLink: >- /api/v1/namespaces/default/persistentvolumeclaims/data-idolized-rabbit-solace-0 uid: a098f782-f086-4359-b37d-8e5be0df85f9 resourceVersion: '215811' creationTimestamp: '2019-10-25T13:41:34Z' labels: app: solace release: idolized-rabbit finalizers:

{ "kind": "PersistentVolumeClaim", "apiVersion": "v1", "metadata": { "name": "data-idolized-rabbit-solace-0", "namespace": "default", "selfLink": "/api/v1/namespaces/default/persistentvolumeclaims/data-idolized-rabbit-solace-0", "uid": "a098f782-f086-4359-b37d-8e5be0df85f9", "resourceVersion": "215811", "creationTimestamp": "2019-10-25T13:41:34Z", "labels": { "app": "solace", "release": "idolized-rabbit" }, "finalizers": [ "kubernetes.io/pvc-protection" ] }, "spec": { "accessModes": [ "ReadWriteOnce" ], "resources": { "requests": { "storage": "30Gi" } }, "volumeMode": "Filesystem" }, "status": { "phase": "Pending" } }

image

davesargrad commented 5 years ago

I've also successfully setup the solace instance so it uses an NFS persistent volume claim.. however yet again.. its pending. Do you have a simple description of how to create an NFS PV that will address this claim?

image

davesargrad commented 5 years ago

I've created an NFS PV. I think i'm close.. yet the PVC is still pending.. image

`kind: PersistentVolume apiVersion: v1 metadata: name: solace-pv labels: app: solace spec: storageClassName: malinfs capacity: storage: 10Gi accessModes:

/var/nfsshare is the directory exported by the NFS server running at 10.93.98.16

davesargrad commented 5 years ago

I think the reason that the PVC is still pending may be that i need an annotation on the PV.. (to allow the PVC selector to work).

I tried putting a label on the PV (app: solace) but this didnt work.. What am I still missing?

bczoma commented 5 years ago

The k8s environment may not have a default storage class available. You can verify what you have by running kubectl get sc? If there is a StorageClass but not marked as default, you can specify to use that by setting the Helm storage.useStorageClass set to its name. If no storage class is available you can create one that is specific to your provider following https://kubernetes.io/docs/concepts/storage/storage-classes/ and then specify to use that if it is not set as default. If using AWS or GCP and want the solace Helm chart take care of it, you can also set cloudProvider and storage.type: standard in the values, which will create a StorageClass. See also https://github.com/SolaceProducts/solace-kubernetes-quickstart#kubernetes-volume-types-support

For NFS, here is an example of creating a provider that worked for me: helm install stable/nfs-server-provisioner --name nfs-test --set persistence.enabled=true,persistence.size=100Gi

davesargrad commented 5 years ago

Thanks.

i've seen that nfs-server-provisioner before.. i already have an NFS server setup (that i've used with other k&s clients).. The server I have is running at 10.93.98.16 and exports /var/nfsshare.

I am trying to create an NFS PV that the solace PVC will then select.

This is what I have so far:

`kind: PersistentVolume apiVersion: v1 metadata: name: solace-pv labels: app: solace spec: storageClassName: malinfs capacity: storage: 10Gi accessModes:

davesargrad commented 5 years ago

image

davesargrad commented 5 years ago

Perhaps when I already have a PV .. all I need to do is to use:

storage: persistent: false existingVolume: solace-disk

I have now created an NFS PV called solace-disk, and I've updated the solace values.yaml to use an existingVolume

Is this proper?

image

image

How can I verify that solace is properly using that NFS mount? In other words, I dont yet see that its writing to that shared folder.

image

My current values.yaml follows `solace: redundancy: false size: dev100 cloudProvider: undefined image: repository: solace/solace-pubsub-standard tag: latest pullPolicy: IfNotPresent filepaths: configmap: "/mnt/disks/solace" secrets: "/mnt/disks/secrets" service: internal: false type: LoadBalancer externalPort:

bczoma commented 5 years ago

Yes, that's correct was just going to suggest that. To verify, you can either go to your nfs server and check for directories used by solace, such as jail or diag; or access the solace container kubectl exec -it <pod> bash and go to /usr/sw - most directories under shall be mounted from the nfs server

bczoma commented 5 years ago

Note that this works for a non-HA solace deployment but for an HA deployment you need to use a storage class as the same volume cannot be mounted and used by all the solace HA instances.

davesargrad commented 5 years ago

sounds like im close.. so

I get into the solace pod.. and go to /usr/sw.. but when i "touch a file" i dont see that on my NFS server (under /var/nfsshare)

image

Perhaps I needed my PV to point to /usr/sw .. rather than the /var/nfsshare..

image

bczoma commented 5 years ago

/usr/sw is not mounted. Try to go into its jailsubdir and do the same.

davesargrad commented 5 years ago

image

davesargrad commented 5 years ago

Can you think of why /usr/sw is not mounted? I'm sure the NFS server is exporting /var/nfsshare. I have other K&S pods that use it.

I dont see how the running stateful set is referencing the solace-disk PV

image

image

bczoma commented 5 years ago

I'll need to look at why EmptyDir is reported for data

davesargrad commented 5 years ago

That makes sense.. I wondered the same. Please let me know what you find.

bczoma commented 5 years ago

Can you try storage: persistent: true useStorageClass: malinfs size: 10Gi in values?

davesargrad commented 5 years ago

You da man!! That seemed to work!

image

My values file.. for future reference.

image

And my PV file

image

I appreciate your awesome support. Ty

bczoma commented 5 years ago

Great! Thanks for the input. We are restructuring/updating this quickstart including enhancing the documentation and will cover anything that was missing here.

davesargrad commented 5 years ago

Hmmm.. Will try to figure this out.. my pod is now in crashloopbackoff..

image

davesargrad commented 5 years ago

Yet My PVC is now bound

image

davesargrad commented 5 years ago

image

davesargrad commented 5 years ago

Looks like its failing the liveness test. This would explain why i couldnt get to port 8080.. Or to port 31186 (from my main network)

image

bczoma commented 5 years ago

There may be multiple reasons ranging from missing resources to tight security constraints. Try to delete your current deployment, then re-install. When the solace pod starts running, pipe the logs kubectl logs <pod> -f there may be a relevant ERROR with preceeding WARN logs that may reveal the reason

davesargrad commented 5 years ago

Hi @bczoma

I'm finally getting back to this. I apologize for not responding sooner. I see the following within the log of the ppod:

[root@tonga solace]# kubectl logs good-dachshund-solace-0 Host Boot ID: b194612f-28d8-4010-a645-d5dfe490578f Starting VMR Docker Container: Tue Nov 5 13:22:54 UTC 2019 SolOS Version: soltr_9.3.0.22 2019-11-05T13:24:05.327+00:00 <syslog.info> good-dachshund-solace-0 rsyslogd: [origin software="rsyslogd" swVersion="8.1908.0" x-pid="372" x-info="https://www.rsyslog.com"] start 2019-11-05T13:24:06.323+00:00 <local6.info> good-dachshund-solace-0 root[370]: rsyslog startup 2019-11-05T13:24:07.341+00:00 <local0.info> good-dachshund-solace-0 root: EXTERN_SCRIPT INFO: Log redirection enabled, beginning playback of startup log buffer 2019-11-05T13:24:07.352+00:00 <local0.info> good-dachshund-solace-0 root: EXTERN_SCRIPT INFO: Container user 'appuser' is now in 'root' groups 2019-11-05T13:24:07.361+00:00 <local0.info> good-dachshund-solace-0 root: EXTERN_SCRIPT INFO: /usr/sw/var/soltr_9.3.0.22/db/dbBaseline already exists and will not be generated by confd 2019-11-05T13:24:07.376+00:00 <local0.info> good-dachshund-solace-0 root: EXTERN_SCRIPT INFO: repairDatabase.py: processing database (currDbPath: /usr/sw/var/soltr_9.3.0.22/.dbHistory/db.00000876, nextDbPath: /usr/sw/var/soltr_9.3.0.22/.dbHistory/db.00000877) 2019-11-05T13:24:07.389+00:00 <local0.info> good-dachshund-solace-0 root: EXTERN_SCRIPT INFO: Processing baseline /usr/sw/var/soltr_9.3.0.22/.dbHistory/db.00000876/dbBaseline 2019-11-05T13:24:07.398+00:00 <local0.info> good-dachshund-solace-0 root: EXTERN_SCRIPT INFO: Finished playback of log buffer 2019-11-05T13:24:08.020+00:00 <local0.warning> good-dachshund-solace-0 root[414]: /usr/sw ipcCommon.cpp:430 (BASE_IPC - 0x00000000) main(0)@solevent(?) WARN SolOS is not currently up - aborting attempt to start solevent process 2019-11-05T13:24:08.027+00:00 <local0.warning> good-dachshund-solace-0 pam_event[413]: WARN Failed raising event, rc: 2, event SYSTEM_AUTHENTICATION_SESSION_OPENED shell(sudo),<413>,internal,root,root 2019-11-05T13:24:08.654+00:00 <local0.warning> good-dachshund-solace-0 root[416]: /usr/sw ipcCommon.cpp:430 (BASE_IPC - 0x00000000) main(0)@solevent(?) WARN SolOS is not currently up - aborting attempt to start solevent process 2019-11-05T13:24:08.660+00:00 <local0.warning> good-dachshund-solace-0 pam_event[413]: WARN Failed raising event, rc: 2, event SYSTEM_AUTHENTICATION_SESSION_CLOSED shell(sudo),<413>,root,root 2019-11-05T13:24:08.670+00:00 <local0.info> good-dachshund-solace-0 root: EXTERN_SCRIPT INFO: Updating dbBaseline with dynamic instance metadata 2019-11-05T13:24:08.979+00:00 <local0.info> good-dachshund-solace-0 root: EXTERN_SCRIPT INFO: Mirroring host timezone 2019-11-05T13:24:08.988+00:00 <local0.info> good-dachshund-solace-0 root: EXTERN_SCRIPT INFO: Generating SSH key ssh-keygen: generating new host keys: RSA1 RSA DSA ECDSA ED25519 2019-11-05T13:24:09.439+00:00 <local0.info> good-dachshund-solace-0 root: EXTERN_SCRIPT INFO: Starting solace process 2019-11-05T13:24:10.226+00:00 <local0.info> good-dachshund-solace-0 root: EXTERN_SCRIPT INFO: Launching solacedaemon: /usr/sw/loads/soltr_9.3.0.22/bin/solacedaemon --vmr -z -f /usr/sw/loads/soltr_9.3.0.22/SolaceStartup.txt -r -1 2019-11-05T13:24:18.028+00:00 <local0.warning> good-dachshund-solace-0 root[1]: /usr/sw main.cpp:739 (SOLDAEMON - 0x00000000) main(0)@solacedaemon WARN Determining platform type: [ OK ] 2019-11-05T13:24:18.069+00:00 <local0.warning> good-dachshund-solace-0 root[1]: /usr/sw main.cpp:739 (SOLDAEMON - 0x00000000) main(0)@solacedaemon WARN Generating license file: [ OK ] 2019-11-05T13:24:18.072+00:00 <local0.warning> good-dachshund-solace-0 root[1]: /usr/sw main.cpp:739 (SOLDAEMON - 0x00000000) main(0)@solacedaemon WARN Running pre-startup checks: [ OK ] 2019-11-05T13:24:25.440+00:00 <local0.err> good-dachshund-solace-0 root[1]: /usr/sw main.cpp:4014 (SOLDAEMON - 0x00000001) main(0)@solacedaemon ERROR ######## System startup initiated (Version 9.3.0.22) ######## 2019-11-05T13:24:25.444+00:00 <local0.warning> good-dachshund-solace-0 root[1]: /usr/sw main.cpp:739 (SOLDAEMON - 0x00000000) main(0)@solacedaemon WARN Monitoring SolOS processes: [ OK ] 2019-11-05T13:24:29.948+00:00 <local0.warning> good-dachshund-solace-0 root[605]: /usr/sw mpliMoMsgService.cpp:2519 (MP - 0x00000000) main(0)@mgmtplane(9) WARN Product-key 'Message VPN 25' ignored due to unsupported platform 2019-11-05T13:24:30.065+00:00 <local0.warning> good-dachshund-solace-0 root[1]: /usr/sw main.cpp:2889 (SOLDAEMON - 0x00000000) main(0)@solacedaemon WARN Inherited dynamic child: /sbin/rsyslogd -n &>/var/log/solace/rsyslogd.log (pid 553) 2019-11-05T13:24:30.065+00:00 <local0.warning> good-dachshund-solace-0 root[1]: /usr/sw main.cpp:2889 (SOLDAEMON - 0x00000000) main(0)@solacedaemon WARN Inherited dynamic child: python /usr/sw/loads/soltr_9.3.0.22/scripts/vmr-solaudit -d (pid 534) 2019-11-05T13:24:30.952+00:00 <local0.err> good-dachshund-solace-0 root[603]: /usr/sw adCpDiskThread.cpp:319 (AD_CP - 0x00000000) main(0)@controlplane(10) ERROR soldisktest: write performance 9 MBps. disk performance does not meet the minimal requirement (20 MBps). Quality of service degradation expected 2019-11-05T13:24:32.791+00:00 <local0.warning> good-dachshund-solace-0 root[602]: /usr/sw adCmnMirror.cpp:27 (ADMANAGER - 0x00000000) main(0)@dataplane(11) WARN AdCmnMirrorIndex::adTotalNumSpoolFiles_ms=50000 2019-11-05T13:24:32.791+00:00 <local0.warning> good-dachshund-solace-0 root[602]: /usr/sw adCmnMirror.cpp:60 (ADMANAGER - 0x00000000) main(0)@dataplane(11) WARN Platform=Standard, MemSize=2097152KB, ADB=Virtual 2019-11-05T13:24:32.791+00:00 <local0.warning> good-dachshund-solace-0 root[602]: /usr/sw adCmnMirror.cpp:71 (ADMANAGER - 0x00000000) main(0)@dataplane(11) WARN alllocateMirrorMem: adVirtualUnackedListNumBlocks_ms=210000 adAdbUnackedListNumBlocks_ms=210000 adTotalNumSpoolFiles_ms=50000 mirrorFootprint_ms=124032480 2019-11-05T13:24:32.791+00:00 <local0.warning> good-dachshund-solace-0 root[602]: /usr/sw adCmnMirror.cpp:80 (ADMANAGER - 0x00000000) main(0)@dataplane(11) WARN Allocate 124032480 bytes for mirror on platform Standard (memSize=2097152KB, Virtual) 2019-11-05T13:24:32.812+00:00 <local0.warning> good-dachshund-solace-0 root[602]: /usr/sw dpMsgAllocationMgr.cpp:189 (DP_MSG_ALLOC - 0x00000000) main(0)@dataplane(11) WARN Total DpMsg buffers: 50000 2019-11-05T13:24:32.846+00:00 <local0.warning> good-dachshund-solace-0 root[602]: /usr/sw dpMsgAllocationMgr.cpp:203 (DP_MSG_ALLOC - 0x00000000) main(0)@dataplane(11) WARN Global DpMsg buffers pool: 0x7f1c6b4e4000 2019-11-05T13:24:32.859+00:00 <local0.warning> good-dachshund-solace-0 root[602]: /usr/sw adCmnMirror.cpp:100 (ADMANAGER - 0x00000000) main(0)@dataplane(11) WARN Create mirror at 0x7f1c718d5040 2019-11-05T13:24:32.862+00:00 <local0.warning> good-dachshund-solace-0 root[602]: /usr/sw adCmnClock.cpp:47 (ADMANAGER - 0x00000000) main(0)@dataplane(11) WARN AdCmnClock: procSpeed=3392733696 cyclesStart=3501137863652860 cyclesEnd=3501137867353828 timeInUs=1090 2019-11-05T13:24:32.862+00:00 <local0.warning> good-dachshund-solace-0 root[602]: /usr/sw adCmnClock.cpp:51 (ADMANAGER - 0x00000000) main(0)@dataplane(11) WARN AdCmnClock: timeofdayStart=1572960272.860925 timeofdayEnd=1572960272.862016 2019-11-05T13:24:32.862+00:00 <local0.warning> good-dachshund-solace-0 root[602]: /usr/sw adCmnRfadMgr.cpp:181 (ADMANAGER - 0x00000000) main(0)@dataplane(11) WARN Creating AdCmnSoftAdb: size=149946368 2019-11-05T13:24:32.862+00:00 <local0.warning> good-dachshund-solace-0 root[602]: /usr/sw adCmnSoftAdb.cpp:84 (ADMANAGER - 0x00000001) main(0)@dataplane(11) WARN AdCmnSoftAdb: constructing 2019-11-05T13:24:32.918+00:00 <local0.warning> good-dachshund-solace-0 root[602]: /usr/sw adCmnSoftAdb.cpp:152 (ADMANAGER - 0x00000001) main(0)@dataplane(11) WARN AdCmnSoftAdb: AdCmnAIORing created at 0x37dde10 2019-11-05T13:24:32.920+00:00 <local0.warning> good-dachshund-solace-0 root[602]: /usr/sw adCmnOperation.cpp:111 (ADMANAGER - 0x00000000) main(0)@dataplane(11) WARN Enable histogram for operation (SA:MateLinkLatency) 2019-11-05T13:24:32.920+00:00 <local0.warning> good-dachshund-solace-0 root[602]: /usr/sw adCmnOperation.cpp:111 (ADMANAGER - 0x00000000) main(0)@dataplane(11) WARN Enable histogram for operation (SA:NetworkLatency) 2019-11-05T13:24:32.920+00:00 <local0.warning> good-dachshund-solace-0 root[602]: /usr/sw adCmnOperation.cpp:111 (ADMANAGER - 0x00000000) main(0)@dataplane(11) WARN Enable histogram for operation (SA:JournalMateWrite) 2019-11-05T13:24:32.920+00:00 <local0.warning> good-dachshund-solace-0 root[602]: /usr/sw adCmnSoftAdb.cpp:158 (ADMANAGER - 0x00000001) main(0)@dataplane(11) WARN AdCmnSoftAdb: AdCmnSoftMateLink created at 0x7f1c5efbe010 2019-11-05T13:24:32.921+00:00 <local0.warning> good-dachshund-solace-0 root[602]: /usr/sw adCmnDiskTrans.cpp:365 (ADMANAGER - 0x00000000) main(0)@dataplane(11) WARN AdCmnDiskTrans: MsgStoreStartAddr=0x076499e0 TrailerStartAddr=0x08eff000 adbBottomReservedStartAddr=0x08effc00 2019-11-05T13:24:32.921+00:00 <local0.warning> good-dachshund-solace-0 root[602]: /usr/sw adCmnOperation.cpp:111 (ADMANAGER - 0x00000000) main(0)@dataplane(11) WARN Enable histogram for operation (SA:BackingStoreDiskThroughput) 2019-11-05T13:24:32.921+00:00 <local0.warning> good-dachshund-solace-0 root[602]: /usr/sw adCmnOperation.cpp:111 (ADMANAGER - 0x00000000) main(0)@dataplane(11) WARN Enable histogram for operation (SA:JournalDiskThroughput) 2019-11-05T13:24:32.921+00:00 <local0.warning> good-dachshund-solace-0 root[602]: /usr/sw adCmnOperation.cpp:111 (ADMANAGER - 0x00000000) main(0)@dataplane(11) WARN Enable histogram for operation (SA:JournalDiskLatency) 2019-11-05T13:24:32.952+00:00 <local0.warning> good-dachshund-solace-0 root[602]: /usr/sw adCmnDiskTrans.cpp:115 (ADMANAGER - 0x00000001) main(0)@dataplane(11) WARN Created file /usr/sw/internalSpool/softAdb/.diskTest size 4096 fd 83 2019-11-05T13:24:32.955+00:00 <local0.warning> good-dachshund-solace-0 root[602]: /usr/sw adCmnDiskTrans.cpp:518 (ADMANAGER - 0x00000001) main(0)@dataplane(11) WARN Found journal size is 230490112 2019-11-05T13:24:32.955+00:00 <local0.warning> good-dachshund-solace-0 root[602]: /usr/sw adCmnDiskTrans.cpp:525 (ADMANAGER - 0x00000001) main(0)@dataplane(11) WARN Reduce journal size by 29163520 to 201326592 2019-11-05T13:24:32.956+00:00 <local0.alert> good-dachshund-solace-0 root[602]: /usr/sw adCmnDiskTrans.cpp:531 (ADMANAGER - 0x00000001) main(0)@dataplane(11) FATAL 'ASSERT: (journalChunkSize_m & (journalChunkSize_m - 1)) == 0' 2019-11-05T13:24:32.956+00:00 <local0.alert> good-dachshund-solace-0 root[602]: /usr/sw adCmnDiskTrans.cpp:531 (ADMANAGER - 0x00000001) main(0)@dataplane(11) FATAL Stack trace: 18 stack frames obtained ... 2019-11-05T13:24:32.956+00:00 <local0.alert> good-dachshund-solace-0 root[602]: /usr/sw adCmnDiskTrans.cpp:531 (ADMANAGER - 0x00000001) main(0)@dataplane(11) FATAL /usr/sw/loads/currentload/lib64/libcommon.so(comUtils::getBacktrace(char*, int, bool)+0x34) [0x7f1c89a38074] 2019-11-05T13:24:32.956+00:00 <local0.alert> good-dachshund-solace-0 root[602]: /usr/sw adCmnDiskTrans.cpp:531 (ADMANAGER - 0x00000001) main(0)@dataplane(11) FATAL /usr/sw/loads/currentload/lib64/liblogbase.so(CommonLogging::getBacktrace(char*, int)+0x1e) [0x7f1c828e5bfe] 2019-11-05T13:24:32.956+00:00 <local0.alert> good-dachshund-solace-0 root[602]: /usr/sw adCmnDiskTrans.cpp:531 (ADMANAGER - 0x00000001) main(0)@dataplane(11) FATAL /usr/sw/loads/currentload/lib64/liblogbase.so(LoggingSyslogMsg::LoggingSyslogMsg(char const*, void const*, void const*, void const*, void const*, char const*, int, logSubSystem_t, logEvent_t, logLevel_t, bool)+0x194) [0x7f1c828e8644] 2019-11-05T13:24:32.956+00:00 <local0.alert> good-dachshund-solace-0 root[602]: /usr/sw adCmnDiskTrans.cpp:531 (ADMANAGER - 0x00000001) main(0)@dataplane(11) FATAL /usr/sw/loads/currentload/lib64/libcommon.so(Logging::log(logSubSystem_t, logEvent_t, logLevel_t, unsigned int, char const*, char const*, int, void const*, void const*, void const*, void const*, bool, CommonLogging::exitReason_t, int)+0x169) [0x7f1c89a717a9] 2019-11-05T13:24:32.956+00:00 <local0.alert> good-dachshund-solace-0 root[602]: /usr/sw adCmnDiskTrans.cpp:531 (ADMANAGER - 0x00000001) main(0)@dataplane(11) FATAL /usr/sw/loads/currentload/lib64/liblogbase.so(CommonLogging::getHandleLogEvent(logSubSystem_t, logEvent_t, logLevel_t, char const*, char const*, int, void const*, void const*, void const*, void const*, bool, CommonLogging::exitReason_t, int)+0x1c2) [0x7f1c828e6172] 2019-11-05T13:24:32.956+00:00 <local0.alert> good-dachshund-solace-0 root[602]: /usr/sw adCmnDiskTrans.cpp:531 (ADMANAGER - 0x00000001) main(0)@dataplane(11) FATAL /usr/sw/loads/currentload/lib64/liblogbase.so(CommonLogging::logAssertN(logSubSystem_t, logEvent_t, logLevel_t, char const*, char const*, int, char const*, char const*, char const*, char const*, unsigned long, unsigned long, unsigned long, unsigned long, unsigned int)+0x2ba) [0x7f1c828e6eaa] 2019-11-05T13:24:32.956+00:00 <local0.alert> good-dachshund-solace-0 root[602]: /usr/sw adCmnDiskTrans.cpp:531 (ADMANAGER - 0x00000001) main(0)@dataplane(11) FATAL /usr/sw/loads/soltr_9.3.0.22/bin/dataplane(CommonLogging::logAssert(logSubSystem_t, logEvent_t, logLevel_t, char const*, char const*, int)+0x5d) [0x73b12d] 2019-11-05T13:24:32.956+00:00 <local0.alert> good-dachshund-solace-0 root[602]: /usr/sw adCmnDiskTrans.cpp:531 (ADMANAGER - 0x00000001) main(0)@dataplane(11) FATAL /usr/sw/loads/soltr_9.3.0.22/bin/dataplane(assuredDelivery::AdCmnDiskTrans::AdCmnDiskTrans(assuredDelivery::AdCmnSoftAdb*, assuredDelivery::AdCmnSoftMateLink*, assuredDelivery::AdCmnAIORing*, bool, char const*, char const*, bool)+0x14c4) [0x8160a4] 2019-11-05T13:24:32.956+00:00 <local0.alert> good-dachshund-solace-0 root[602]: /usr/sw adCmnDiskTrans.cpp:531 (ADMANAGER - 0x00000001) main(0)@dataplane(11) FATAL /usr/sw/loads/soltr_9.3.0.22/bin/dataplane(assuredDelivery::AdCmnSoftAdb::AdCmnSoftAdb(unsigned int, bool, char const*)+0x39c) [0x76d64c] 2019-11-05T13:24:32.956+00:00 <local0.alert> good-dachshund-solace-0 root[602]: /usr/sw adCmnDiskTrans.cpp:531 (ADMANAGER - 0x00000001) main(0)@dataplane(11) FATAL /usr/sw/loads/soltr_9.3.0.22/bin/dataplane(assuredDelivery::AdCmnRfadMgr::initRfad()+0x47f) [0x7af76f] 2019-11-05T13:24:32.956+00:00 <local0.alert> good-dachshund-solace-0 root[602]: /usr/sw adCmnDiskTrans.cpp:531 (ADMANAGER - 0x00000001) main(0)@dataplane(11) FATAL /usr/sw/loads/soltr_9.3.0.22/bin/dataplane(assuredDelivery::AdCmnRfadMgr::init()+0x42) [0x7afe22] 2019-11-05T13:24:32.956+00:00 <local0.alert> good-dachshund-solace-0 root[602]: /usr/sw adCmnDiskTrans.cpp:531 (ADMANAGER - 0x00000001) main(0)@dataplane(11) FATAL /usr/sw/loads/soltr_9.3.0.22/bin/dataplane(assuredDelivery::AdCmnRfadApi::init(DpMsgAllocationConsumer&)+0x20) [0x7def40] 2019-11-05T13:24:32.956+00:00 <local0.alert> good-dachshund-solace-0 root[602]: /usr/sw adCmnDiskTrans.cpp:531 (ADMANAGER - 0x00000001) main(0)@dataplane(11) FATAL /usr/sw/loads/soltr_9.3.0.22/bin/dataplane(assuredDelivery::AdContext::initialize(comStartup::restartReason_t, CmdLineOptions&, void*, void*, void*, void*)+0x413) [0x748043] 2019-11-05T13:24:32.956+00:00 <local0.alert> good-dachshund-solace-0 root[602]: /usr/sw adCmnDiskTrans.cpp:531 (ADMANAGER - 0x00000001) main(0)@dataplane(11) FATAL /usr/sw/loads/currentload/lib64/libcommon.so(ProcessMgr::doMain()+0x104) [0x7f1c89a827b4] 2019-11-05T13:24:32.956+00:00 <local0.alert> good-dachshund-solace-0 root[602]: /usr/sw adCmnDiskTrans.cpp:531 (ADMANAGER - 0x00000001) main(0)@dataplane(11) FATAL /usr/sw/loads/currentload/lib64/libcommon.so(ProcessMgr::main()+0xe) [0x7f1c89a0f92e] 2019-11-05T13:24:32.956+00:00 <local0.alert> good-dachshund-solace-0 root[602]: /usr/sw adCmnDiskTrans.cpp:531 (ADMANAGER - 0x00000001) main(0)@dataplane(11) FATAL /usr/sw/loads/soltr_9.3.0.22/bin/dataplane(main+0x1a5) [0x6bb855] 2019-11-05T13:24:32.956+00:00 <local0.alert> good-dachshund-solace-0 root[602]: /usr/sw adCmnDiskTrans.cpp:531 (ADMANAGER - 0x00000001) main(0)@dataplane(11) FATAL /lib64/libc.so.6(__libc_start_main+0xf5) [0x7f1c7c5a9495] 2019-11-05T13:24:32.956+00:00 <local0.alert> good-dachshund-solace-0 root[602]: /usr/sw adCmnDiskTrans.cpp:531 (ADMANAGER - 0x00000001) main(0)@dataplane(11) FATAL /usr/sw/loads/soltr_9.3.0.22/bin/dataplane(unknown) [0x738fb9] 2019-11-05T13:24:32.957+00:00 <local0.err> good-dachshund-solace-0 root[602]: /usr/sw adContext.cpp:142 (ADMANAGER - 0x00000000) main(0)@dataplane(11) ERROR Caught unexpected signal: signum=6 (Aborted) 2019-11-05T13:24:32.957+00:00 <local0.err> good-dachshund-solace-0 root[1]: /usr/sw main.cpp:2169 (SOLDAEMON - 0x00000000) main(0)@solacedaemon ERROR Child process dying, PID: 602, command: '/usr/sw/loads/soltr_9.3.0.22/bin/dataplane -h 80', status: program termination in progress due to signal 'Aborted', core dump is being produced. Cycle count: 3501138190289443 2019-11-05T13:24:32.957+00:00 <local0.err> good-dachshund-solace-0 root[1]: /usr/sw main.cpp:1708 (SOLDAEMON - 0x00000001) main(0)@solacedaemon ERROR ######## System shutdown initiated: error detected, reboot requested ######## Unable to raise event; rc(would block) 2019-11-05T13:24:33.082+00:00 <local0.warning> good-dachshund-solace-0 root[714]: /usr/sw main.cpp:359 (SOLEVENT - 0x00000000) SolEventThread(2)@solevent(23) WARN Unable to raise event; rc(would block) 2019-11-05T13:24:33.192+00:00 <local0.info> good-dachshund-solace-0 root[717]: /usr/sw/loads/soltr_9.3.0.22/scripts/commonLogging.py:76 WARN Running vmr-solredswitch 2019-11-05T13:24:34.217+00:00 <local0.warning> good-dachshund-solace-0 root[1]: /usr/sw main.cpp:2996 (SOLDAEMON - 0x00000000) main(0)@solacedaemon WARN Terminating CLI processes ... 2019-11-05T13:24:37.218+00:00 <local0.warning> good-dachshund-solace-0 root[1]: /usr/sw main.cpp:3003 (SOLDAEMON - 0x00000000) main(0)@solacedaemon WARN Terminating children ... 2019-11-05T13:24:37.360+00:00 <local0.info> good-dachshund-solace-0 root[534]: /usr/sw/loads/soltr_9.3.0.22/scripts/commonLogging.py:76 WARN Received signal 15, terminating 2019-11-05T13:24:37.378+00:00 <local0.warning> good-dachshund-solace-0 root[1]: /usr/sw Generated_commonReturnCodes.cpp:135 (BASE - 0x00000000) main(0)@solacedaemon WARN Unknown exit value 143, defaulting it to 'fail'. 2019-11-05T13:24:37.378+00:00 <local0.warning> good-dachshund-solace-0 root[1]: /usr/sw Generated_commonReturnCodes.cpp:135 (BASE - 0x00000000) main(0)@solacedaemon WARN Unknown exit value 143, defaulting it to 'fail'.

The first errors I see are: ERROR ######## System startup initiated (Version 9.3.0.22) ######## 2019-11-05T13:24:25.444+00:00 <local0.warning> good-dachshund-solace-0 root[1]: /usr/sw main.cpp:739 (SOLDAEMON - 0x00000000) main(0)@solacedaemon

which doesnt really seem to be an error, and

ERROR soldisktest: write performance 9 MBps. disk performance does not meet the minimal requirement (20 MBps). Quality of service degradation expected 2019-11-05T13:24:32.791+00:00 <local0.warning> good-dachshund-solace-0 root[602]: /usr/sw

which also seems to be more of a warning than an error.

The first fatal message seems to be

FATAL 'ASSERT: (journalChunkSize_m & (journalChunkSize_m - 1)) == 0' 2019-11-05T13:24:32.956+00:00 <local0.alert> good-dachshund-solace-0 root[602]: /usr/sw adCmnDiskTrans.cpp:531 (ADMANAGER - 0x00000001) main(0)@dataplane(11)

davesargrad commented 5 years ago

Hi @bczoma

Please let me know if there are other experiments that I should do which would help you to understand what I am seeing. I do think that the FATAL assert message about journalChunkSize (from the ADMANAGER) ... should provide very distinctive input to any of your solace software experts

In fact the message is coming from line 531 of the adCmnDiskTrans.cpp code ... I'd think that it would be relatively easily to map this low-level assertion to the high-level condition that would cause that.

bczoma commented 5 years ago

Hi David, looking into it. What kind of K8S platform are you using?

davesargrad commented 5 years ago

Not sure what you mean by kind of platform. This is version 1.16 of the k8s platform.. Solace works fine until I configure it to use that NFS persistent volume, rather than disk local to the pod node

The NFS persistent volume is properly mounted. In fact you can see that solace creates the disk infrastructure below

image

The other mount points that you see there are used successfully by other K8S enabled pods

One thing to note is that the mount point only has about 9.7 GB of disk available. If Solace is trying to grab a minimal disk of 10GB.. this will fail. I will try to free up some disk and see if this remedies the issue.

I just freed up more than 20GB of space on the PV NFS share that solace uses via a PVC.. I'll let you know if this helps.

davesargrad commented 5 years ago

Ok.. I've given the share much more space, and recreated the solace stateful set ..

image

bczoma commented 5 years ago

I meant to ask the k8s provider. Re. the issue, can you also try to specify storage.nfs: true as in https://github.com/SolaceProducts/solace-kubernetes-quickstart/blob/master/solace/values-examples/prod1k-persist-ha-nfs.yaml#L65?

davesargrad commented 5 years ago

I've installed K8S per the procedures documented here: https://kubernetes.io/.

I have tried to allocate more NFS space to solace.. I see the same failure. Solace uses that PV and begins to bootstrap properly.

The liveness test still fails because the solace admin portal is never exposed on port 8080.

Looking at the detailed pod log I see a FATAL message that didnt jump out before.

FATAL file /usr/sw/internalSpool/softAdb/backingStore actual size(22417408) != expected size(149946368) 2019-11-05T18:30:00.802+00:00 <local0.err> cantankerous-sloth-solace-0 root[592]: /usr/sw adContext.cpp:142 (ADMANAGER - 0x00000000) main(0)@dataplane(11)

Full log below

[root@tonga pv]# kubectl logs cantankerous-sloth-solace-0 -f Host Boot ID: b194612f-28d8-4010-a645-d5dfe490578f Starting VMR Docker Container: Tue Nov 5 18:28:20 UTC 2019 SolOS Version: soltr_9.3.0.22 2019-11-05T18:29:31.618+00:00 <syslog.info> cantankerous-sloth-solace-0 rsyslogd: [origin software="rsyslogd" swVersion="8.1908.0" x-pid="363" x-info="https://www.rsyslog.com"] start 2019-11-05T18:29:32.615+00:00 <local6.info> cantankerous-sloth-solace-0 root[361]: rsyslog startup 2019-11-05T18:29:33.632+00:00 <local0.info> cantankerous-sloth-solace-0 root: EXTERN_SCRIPT INFO: Log redirection enabled, beginning playback of startup log buffer 2019-11-05T18:29:33.642+00:00 <local0.info> cantankerous-sloth-solace-0 root: EXTERN_SCRIPT INFO: Container user 'appuser' is now in 'root' groups 2019-11-05T18:29:33.653+00:00 <local0.info> cantankerous-sloth-solace-0 root: EXTERN_SCRIPT INFO: /usr/sw/var/soltr_9.3.0.22/db/dbBaseline already exists and will not be generated by confd 2019-11-05T18:29:33.669+00:00 <local0.info> cantankerous-sloth-solace-0 root: EXTERN_SCRIPT INFO: repairDatabase.py: processing database (currDbPath: /usr/sw/var/soltr_9.3.0.22/.dbHistory/db.00000001, nextDbPath: /usr/sw/var/soltr_9.3.0.22/.dbHistory/db.00000002) 2019-11-05T18:29:33.691+00:00 <local0.info> cantankerous-sloth-solace-0 root: EXTERN_SCRIPT INFO: Processing baseline /usr/sw/var/soltr_9.3.0.22/.dbHistory/db.00000001/dbBaseline 2019-11-05T18:29:33.704+00:00 <local0.info> cantankerous-sloth-solace-0 root: EXTERN_SCRIPT INFO: Finished playback of log buffer 2019-11-05T18:29:34.320+00:00 <local0.warning> cantankerous-sloth-solace-0 root[405]: /usr/sw ipcCommon.cpp:430 (BASE_IPC - 0x00000000) main(0)@solevent(?) WARN SolOS is not currently up - aborting attempt to start solevent process 2019-11-05T18:29:34.327+00:00 <local0.warning> cantankerous-sloth-solace-0 pam_event[404]: WARN Failed raising event, rc: 2, event SYSTEM_AUTHENTICATION_SESSION_OPENED shell(sudo),<404>,internal,root,root 2019-11-05T18:29:34.952+00:00 <local0.warning> cantankerous-sloth-solace-0 root[407]: /usr/sw ipcCommon.cpp:430 (BASE_IPC - 0x00000000) main(0)@solevent(?) WARN SolOS is not currently up - aborting attempt to start solevent process 2019-11-05T18:29:34.959+00:00 <local0.warning> cantankerous-sloth-solace-0 pam_event[404]: WARN Failed raising event, rc: 2, event SYSTEM_AUTHENTICATION_SESSION_CLOSED shell(sudo),<404>,root,root 2019-11-05T18:29:34.971+00:00 <local0.info> cantankerous-sloth-solace-0 root: EXTERN_SCRIPT INFO: Updating dbBaseline with dynamic instance metadata 2019-11-05T18:29:35.424+00:00 <local0.info> cantankerous-sloth-solace-0 root: EXTERN_SCRIPT INFO: Mirroring host timezone 2019-11-05T18:29:35.434+00:00 <local0.info> cantankerous-sloth-solace-0 root: EXTERN_SCRIPT INFO: Generating SSH key ssh-keygen: generating new host keys: RSA1 RSA DSA ECDSA ED25519 2019-11-05T18:29:35.737+00:00 <local0.info> cantankerous-sloth-solace-0 root: EXTERN_SCRIPT INFO: Starting solace process 2019-11-05T18:29:36.687+00:00 <local0.info> cantankerous-sloth-solace-0 root: EXTERN_SCRIPT INFO: Launching solacedaemon: /usr/sw/loads/soltr_9.3.0.22/bin/solacedaemon --vmr -z -f /usr/sw/loads/soltr_9.3.0.22/SolaceStartup.txt -r -1 2019-11-05T18:29:44.444+00:00 <local0.warning> cantankerous-sloth-solace-0 root[1]: /usr/sw main.cpp:739 (SOLDAEMON - 0x00000000) main(0)@solacedaemon WARN Determining platform type: [ OK ] 2019-11-05T18:29:44.485+00:00 <local0.warning> cantankerous-sloth-solace-0 root[1]: /usr/sw main.cpp:739 (SOLDAEMON - 0x00000000) main(0)@solacedaemon WARN Generating license file: [ OK ] 2019-11-05T18:29:44.489+00:00 <local0.warning> cantankerous-sloth-solace-0 root[1]: /usr/sw main.cpp:739 (SOLDAEMON - 0x00000000) main(0)@solacedaemon WARN Running pre-startup checks: [ OK ] 2019-11-05T18:29:51.886+00:00 <local0.err> cantankerous-sloth-solace-0 root[1]: /usr/sw main.cpp:4014 (SOLDAEMON - 0x00000001) main(0)@solacedaemon ERROR ######## System startup initiated (Version 9.3.0.22) ######## 2019-11-05T18:29:51.891+00:00 <local0.warning> cantankerous-sloth-solace-0 root[1]: /usr/sw main.cpp:739 (SOLDAEMON - 0x00000000) main(0)@solacedaemon WARN Monitoring SolOS processes: [ OK ] 2019-11-05T18:29:54.292+00:00 <local0.warning> cantankerous-sloth-solace-0 root[595]: /usr/sw mpliMoMsgService.cpp:2519 (MP - 0x00000000) main(0)@mgmtplane(9) WARN Product-key 'Message VPN 25' ignored due to unsupported platform 2019-11-05T18:29:55.768+00:00 <local0.warning> cantankerous-sloth-solace-0 root[1]: /usr/sw main.cpp:2889 (SOLDAEMON - 0x00000000) main(0)@solacedaemon WARN Inherited dynamic child: /sbin/rsyslogd -n &>/var/log/solace/rsyslogd.log (pid 550) 2019-11-05T18:29:55.769+00:00 <local0.warning> cantankerous-sloth-solace-0 root[1]: /usr/sw main.cpp:2889 (SOLDAEMON - 0x00000000) main(0)@solacedaemon WARN Inherited dynamic child: python /usr/sw/loads/soltr_9.3.0.22/scripts/vmr-solaudit -d (pid 524) 2019-11-05T18:29:57.756+00:00 <local0.err> cantankerous-sloth-solace-0 root[593]: /usr/sw adCpDiskThread.cpp:319 (AD_CP - 0x00000000) main(0)@controlplane(10) ERROR soldisktest: write performance 9 MBps. disk performance does not meet the minimal requirement (20 MBps). Quality of service degradation expected 2019-11-05T18:30:00.587+00:00 <local0.warning> cantankerous-sloth-solace-0 root[592]: /usr/sw adCmnMirror.cpp:27 (ADMANAGER - 0x00000000) main(0)@dataplane(11) WARN AdCmnMirrorIndex::adTotalNumSpoolFiles_ms=50000 2019-11-05T18:30:00.587+00:00 <local0.warning> cantankerous-sloth-solace-0 root[592]: /usr/sw adCmnMirror.cpp:60 (ADMANAGER - 0x00000000) main(0)@dataplane(11) WARN Platform=Standard, MemSize=2097152KB, ADB=Virtual 2019-11-05T18:30:00.587+00:00 <local0.warning> cantankerous-sloth-solace-0 root[592]: /usr/sw adCmnMirror.cpp:71 (ADMANAGER - 0x00000000) main(0)@dataplane(11) WARN alllocateMirrorMem: adVirtualUnackedListNumBlocks_ms=210000 adAdbUnackedListNumBlocks_ms=210000 adTotalNumSpoolFiles_ms=50000 mirrorFootprint_ms=124032480 2019-11-05T18:30:00.587+00:00 <local0.warning> cantankerous-sloth-solace-0 root[592]: /usr/sw adCmnMirror.cpp:80 (ADMANAGER - 0x00000000) main(0)@dataplane(11) WARN Allocate 124032480 bytes for mirror on platform Standard (memSize=2097152KB, Virtual) 2019-11-05T18:30:00.606+00:00 <local0.warning> cantankerous-sloth-solace-0 root[592]: /usr/sw dpMsgAllocationMgr.cpp:189 (DP_MSG_ALLOC - 0x00000000) main(0)@dataplane(11) WARN Total DpMsg buffers: 50000 2019-11-05T18:30:00.644+00:00 <local0.warning> cantankerous-sloth-solace-0 root[592]: /usr/sw dpMsgAllocationMgr.cpp:203 (DP_MSG_ALLOC - 0x00000000) main(0)@dataplane(11) WARN Global DpMsg buffers pool: 0x7f16298f9000 2019-11-05T18:30:00.659+00:00 <local0.warning> cantankerous-sloth-solace-0 root[592]: /usr/sw adCmnMirror.cpp:100 (ADMANAGER - 0x00000000) main(0)@dataplane(11) WARN Create mirror at 0x7f162fcea040 2019-11-05T18:30:00.661+00:00 <local0.warning> cantankerous-sloth-solace-0 root[592]: /usr/sw adCmnClock.cpp:47 (ADMANAGER - 0x00000000) main(0)@dataplane(11) WARN AdCmnClock: procSpeed=3392533248 cyclesStart=3563310516597552 cyclesEnd=3563310520222486 timeInUs=1068 2019-11-05T18:30:00.661+00:00 <local0.warning> cantankerous-sloth-solace-0 root[592]: /usr/sw adCmnClock.cpp:51 (ADMANAGER - 0x00000000) main(0)@dataplane(11) WARN AdCmnClock: timeofdayStart=1572978600.660207 timeofdayEnd=1572978600.661275 2019-11-05T18:30:00.661+00:00 <local0.warning> cantankerous-sloth-solace-0 root[592]: /usr/sw adCmnRfadMgr.cpp:181 (ADMANAGER - 0x00000000) main(0)@dataplane(11) WARN Creating AdCmnSoftAdb: size=149946368 2019-11-05T18:30:00.661+00:00 <local0.warning> cantankerous-sloth-solace-0 root[592]: /usr/sw adCmnSoftAdb.cpp:84 (ADMANAGER - 0x00000001) main(0)@dataplane(11) WARN AdCmnSoftAdb: constructing 2019-11-05T18:30:00.713+00:00 <local0.warning> cantankerous-sloth-solace-0 root[592]: /usr/sw adCmnSoftAdb.cpp:152 (ADMANAGER - 0x00000001) main(0)@dataplane(11) WARN AdCmnSoftAdb: AdCmnAIORing created at 0x2c65e10 2019-11-05T18:30:00.715+00:00 <local0.warning> cantankerous-sloth-solace-0 root[592]: /usr/sw adCmnOperation.cpp:111 (ADMANAGER - 0x00000000) main(0)@dataplane(11) WARN Enable histogram for operation (SA:MateLinkLatency) 2019-11-05T18:30:00.715+00:00 <local0.warning> cantankerous-sloth-solace-0 root[592]: /usr/sw adCmnOperation.cpp:111 (ADMANAGER - 0x00000000) main(0)@dataplane(11) WARN Enable histogram for operation (SA:NetworkLatency) 2019-11-05T18:30:00.715+00:00 <local0.warning> cantankerous-sloth-solace-0 root[592]: /usr/sw adCmnOperation.cpp:111 (ADMANAGER - 0x00000000) main(0)@dataplane(11) WARN Enable histogram for operation (SA:JournalMateWrite) 2019-11-05T18:30:00.715+00:00 <local0.warning> cantankerous-sloth-solace-0 root[592]: /usr/sw adCmnSoftAdb.cpp:158 (ADMANAGER - 0x00000001) main(0)@dataplane(11) WARN AdCmnSoftAdb: AdCmnSoftMateLink created at 0x7f161d3d3010 2019-11-05T18:30:00.716+00:00 <local0.warning> cantankerous-sloth-solace-0 root[592]: /usr/sw adCmnDiskTrans.cpp:365 (ADMANAGER - 0x00000000) main(0)@dataplane(11) WARN AdCmnDiskTrans: MsgStoreStartAddr=0x076499e0 TrailerStartAddr=0x08eff000 adbBottomReservedStartAddr=0x08effc00 2019-11-05T18:30:00.716+00:00 <local0.warning> cantankerous-sloth-solace-0 root[592]: /usr/sw adCmnOperation.cpp:111 (ADMANAGER - 0x00000000) main(0)@dataplane(11) WARN Enable histogram for operation (SA:BackingStoreDiskThroughput) 2019-11-05T18:30:00.716+00:00 <local0.warning> cantankerous-sloth-solace-0 root[592]: /usr/sw adCmnOperation.cpp:111 (ADMANAGER - 0x00000000) main(0)@dataplane(11) WARN Enable histogram for operation (SA:JournalDiskThroughput) 2019-11-05T18:30:00.716+00:00 <local0.warning> cantankerous-sloth-solace-0 root[592]: /usr/sw adCmnOperation.cpp:111 (ADMANAGER - 0x00000000) main(0)@dataplane(11) WARN Enable histogram for operation (SA:JournalDiskLatency) 2019-11-05T18:30:00.792+00:00 <local0.warning> cantankerous-sloth-solace-0 root[592]: /usr/sw adCmnDiskTrans.cpp:115 (ADMANAGER - 0x00000001) main(0)@dataplane(11) WARN Created file /usr/sw/internalSpool/softAdb/.diskTest size 4096 fd 83 2019-11-05T18:30:00.796+00:00 <local0.warning> cantankerous-sloth-solace-0 root[592]: /usr/sw adCmnDiskTrans.cpp:518 (ADMANAGER - 0x00000001) main(0)@dataplane(11) WARN Found journal size is 23199744 2019-11-05T18:30:00.796+00:00 <local0.warning> cantankerous-sloth-solace-0 root[592]: /usr/sw adCmnDiskTrans.cpp:525 (ADMANAGER - 0x00000001) main(0)@dataplane(11) WARN Reduce journal size by 23199744 to 0 2019-11-05T18:30:00.802+00:00 <local0.alert> cantankerous-sloth-solace-0 root[592]: /usr/sw adCmnDiskTrans.cpp:150 (ADMANAGER - 0x00000001) main(0)@dataplane(11) FATAL file /usr/sw/internalSpool/softAdb/backingStore actual size(22417408) != expected size(149946368) 2019-11-05T18:30:00.802+00:00 <local0.err> cantankerous-sloth-solace-0 root[592]: /usr/sw adContext.cpp:142 (ADMANAGER - 0x00000000) main(0)@dataplane(11) ERROR Caught unexpected signal: signum=6 (Aborted) 2019-11-05T18:30:00.802+00:00 <local0.err> cantankerous-sloth-solace-0 root[1]: /usr/sw main.cpp:2169 (SOLDAEMON - 0x00000000) main(0)@solacedaemon ERROR Child process dying, PID: 592, command: '/usr/sw/loads/soltr_9.3.0.22/bin/dataplane -h 80', status: program termination in progress due to signal 'Aborted', core dump is being produced. Cycle count: 3563310998683142 2019-11-05T18:30:00.802+00:00 <local0.err> cantankerous-sloth-solace-0 root[1]: /usr/sw main.cpp:1702 (SOLDAEMON - 0x00000001) main(0)@solacedaemon ERROR ######## System shutdown initiated: error detected, reboot requested. Reason: RAM Upscaling is not supported without prior ADB export. ######## Unable to raise event; rc(would block) 2019-11-05T18:30:00.895+00:00 <local0.warning> cantankerous-sloth-solace-0 root[712]: /usr/sw main.cpp:359 (SOLEVENT - 0x00000000) SolEventThread(2)@solevent(23) WARN Unable to raise event; rc(would block) 2019-11-05T18:30:01.106+00:00 <local0.info> cantankerous-sloth-solace-0 root[715]: /usr/sw/loads/soltr_9.3.0.22/scripts/commonLogging.py:76 WARN Running vmr-solredswitch 2019-11-05T18:30:02.187+00:00 <local0.warning> cantankerous-sloth-solace-0 root[1]: /usr/sw main.cpp:2996 (SOLDAEMON - 0x00000000) main(0)@solacedaemon WARN Terminating CLI processes ... 2019-11-05T18:30:05.189+00:00 <local0.warning> cantankerous-sloth-solace-0 root[1]: /usr/sw main.cpp:3003 (SOLDAEMON - 0x00000000) main(0)@solacedaemon WARN Terminating children ... 2019-11-05T18:30:05.284+00:00 <local0.info> cantankerous-sloth-solace-0 root[524]: /usr/sw/loads/soltr_9.3.0.22/scripts/commonLogging.py:76 WARN Received signal 15, terminating 2019-11-05T18:30:05.292+00:00 <local0.warning> cantankerous-sloth-solace-0 root[1]: /usr/sw Generated_commonReturnCodes.cpp:135 (BASE - 0x00000000) main(0)@solacedaemon WARN Unknown exit value 143, defaulting it to 'fail'. 2019-11-05T18:30:05.292+00:00 <local0.warning> cantankerous-sloth-solace-0 root[1]: /usr/sw Generated_commonReturnCodes.cpp:135 (BASE - 0x00000000) main(0)@solacedaemon WARN Unknown exit value 143, defaulting it to 'fail'. 2019-11-05T18:30:25.298+00:00 <local0.warning> cantankerous-sloth-solace-0 root[1]: /usr/sw main.cpp:3042 (SOLDAEMON - 0x00000000) main(0)@solacedaemon WARN Child process will not die, command = /sbin/rsyslogd -n &>/var/log/solace/rsyslogd.log, PID = 550 2019-11-05T18:30:25.298+00:00 <local0.warning> cantankerous-sloth-solace-0 root[1]: /usr/sw main.cpp:3008 (SOLDAEMON - 0x00000000) main(0)@solacedaemon WARN Children terminated 2019-11-05T18:30:25.649+00:00 <local0.info> cantankerous-sloth-solace-0 root[809]: /usr/sw/loads/soltr_9.3.0.22/scripts/commonLogging.py:76 WARN Running vmr-solredswitch 2019-11-05T18:30:25.670+00:00 <local0.warning> cantankerous-sloth-solace-0 root[1]: /usr/sw Generated_commonReturnCodes.cpp:135 (BASE - 0x00000000) main(0)@solacedaemon WARN Unknown exit value 1, defaulting it to 'fail'. 2019-11-05T18:30:25.670+00:00 <local0.warning> cantankerous-sloth-solace-0 root[1]: /usr/sw Generated_commonReturnCodes.cpp:135 (BASE - 0x00000000) main(0)@solacedaemon WARN Unknown exit value 1, defaulting it to 'fail'. 2019-11-05T18:30:25.670+00:00 <local0.warning> cantankerous-sloth-solace-0 root[1]: /usr/sw main.cpp:1080 (SOLDAEMON - 0x00000000) main(0)@solacedaemon WARN Child terminated with failure status: command: 'pkill -P $PPID dataplane-linux' PID: 811 rc: fail status: 256 sigRxd: 0 2019-11-05T18:30:25.745+00:00 <local0.warning> cantankerous-sloth-solace-0 root[1]: /usr/sw main.cpp:3512 (SOLDAEMON - 0x00000000) main(0)@solacedaemon WARN Syncing filesystem before shutdown ... 2019-11-05T18:30:25.810+00:00 <local0.warning> cantankerous-sloth-solace-0 root[1]: /usr/sw main.cpp:3517 (SOLDAEMON - 0x00000000) main(0)@solacedaemon WARN Shutting down router 2019-11-05T18:30:25.810+00:00 <local0.err> cantankerous-sloth-solace-0 root[1]: /usr/sw main.cpp:3496 (SOLDAEMON - 0x00000001) main(0)@solacedaemon ERROR ######## System shutdown complete (Version 9.3.0.22) ######## [root@tonga pv]#

davesargrad commented 5 years ago

My values.yaml

`[root@tonga solace]# cat values.yaml solace: redundancy: false size: dev100 cloudProvider: undefined image: repository: solace/solace-pubsub-standard tag: latest pullPolicy: IfNotPresent filepaths: configmap: "/mnt/disks/solace" secrets: "/mnt/disks/secrets" service: internal: false type: LoadBalancer externalPort:

davesargrad commented 5 years ago

The yaml I use to create the PV

image

davesargrad commented 5 years ago

image

bczoma commented 5 years ago

Let's try followings in values.yaml for nfs:

storage:
  persistent: true
  nfs: true
  useStorageClass: malinfs
  size: 5Gi
davesargrad commented 5 years ago

Wow.. that seemed to work.. I'll log in and see if this instance persists.

Why does solace need to know that this is an NFS PV? I'd think that would be transparent to the solace POD.

image

davesargrad commented 5 years ago

So far so good...

image

bczoma commented 5 years ago

There are planned improvements to using NFS - currently NFS is slow storage for PubSub+. In the meantime this triggers a workaround.

davesargrad commented 5 years ago

I see.. So the solace PubSub+ uses this parameter to work around the warnings we saw?

I can easily live with this.. I am only using NFS as a development tool. Ultimately Id use a different shared storage.. I just wanted to make sure I understood what I was seeing.

Ty for your active help. Your support is excellent.

I will continue to post thoughts and concerns here that I hope will help you to beef up your documentation and perhaps even the product.

At this point I think I can move ahead with some additional experiments.

bczoma commented 5 years ago

Awesome, thanks Dave!