Closed kaparora closed 6 years ago
trident-logs-all.log Attached trident logs
having the same issue was this ever resolved
Today we got Trident running with iSCSI (onatp-san) driver. Everything works fine from installation to provisioning to mounting and consuming storage.
We added NFS as a backend to trident and used it for a mysql deployment. MySql doesn’t work either like ETCD with NFS backend. Here are the logs:
=> sourcing 20-validate-variables.sh ...
=> sourcing 25-validate-replication-variables.sh ...
=> sourcing 30-base-config.sh ...
---> 08:41:11 Processing basic MySQL configuration files ...
=> sourcing 60-replication-config.sh ...
=> sourcing 70-s2i-config.sh ...
---> 08:41:11 Processing additional arbitrary MySQL configuration provided by s2i ...
=> sourcing 40-paas.cnf ...
=> sourcing 50-my-tuning.cnf ...
---> 08:41:11 Initializing database ...
---> 08:41:11 Running mysqld --initialize-insecure ...
2018-05-18T08:41:11.628403Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2018-05-18T08:41:11.629989Z 0 [Warning] Duplicate ignore-db-dir directory name 'lost+found' found in the config file(s). Ignoring the duplicate.
2018-05-18T08:41:11.630674Z 0 [ERROR] --initialize specified but the data directory has files in it. Aborting.
2018-05-18T08:41:11.630700Z 0 [ERROR] Aborting
NFS provisioning and mounting is fine.
I tried to mount nfs volume on a host(worker node) and write to it and it works.
This may have something to do with OpenShift user permissions inside the pod. I have no clue. Any inputs are appreciated.
We're definitely not seeing this in general. Our CI tests with this combination, the same versions. In cases like these there is usually a configuration issue either on the host or on the storage backend that's getting in the way. Troubleshooting this over GitHub would likely require a great deal of back and forth, therefore my suggestion would be to open up a case so that we can work through it live.
Thanks @innergy! a Support case is already open.
@kapilarora How did you resolve this error on the initial install:
DEBU Invoking tunneled command: oc exec trident-cdd5fc7b4-ls8h4 -n netapp2 -c trident-main --tridentctl
DEBU REST interface not yet up, waiting.
Having an issue with Trident 18.04 Install with Openshift 3.7 as well.
@kapilarora,
Any chance you can check the latency between your OpenShift nodes and the data LIF(s)? Just encountered a situation where extreme latency (> 200ms) was causing etcd to (apparently) falsely believe there were locks. Changing to a storage device which is dramatically closer fixed things.
I have no idea at what point the latency might become an issue for etcd, but it would be worth knowing if this could be an issue for you. All of the CI testing is with systems which are a couple ms apart at most, so it's not something we've encountered before.
Andrew
Trident is able to server both backedns NFS and iSCSI PostgreSQL runs fine with NFS we are having issues still with mysql. This is a configuration issue but I dont think we can solve it at trident level. Hence I am closing this issue for now. I am also not able to recreate it in my lab. The customer support case has also been closed.
Today after some troubleshooting we figured that by default openshift template has mountPath /var/lib/mysql/data We changed it to /val/lib/mysql after looking at this issue : https://github.com/docker-library/mysql/issues/69
And, mysql is now running in the OpenShift cluster with ONTAP NFS
I'm seeing this too with OpenShift 3.9 Origin and iSCSI with a ONTAP simulator that was working on earlier deployments
@kapilarora I have an env to reproduce it
i hit the same error today with openshift 3.9 using NFS ontap-cdot 9.1 release. FATA Install failed; PVC trident was not bound after 120000000000 seconds
any idea
@rushins I had success with the newest Trident beta release on Origin 3.9. I was using iscsi though so YMMV. What I have found is it works best on the first go. If you have an existing install that failed you must clean up on the fas by deleting the volume and lun (for iscsi) before proceeding to try again.
@rushins, @japplewhite Make sure that something else doesn't have a pending PVC when creating Trident. In the original 18.04 there was a bug which resulted in a missing piece of metadata preventing the trident
PV from being bound by another PVC. This is/was particularly an issue with OpenShift Enterprise, which deploys Ansible Service Broker (a.k.a. ASB) by default.
If, after starting the Trident install, you do a oc get pvc --all-namespaces
and you see a PVC which is bound to the trident
PV, that is a good indicator.
This was fixed in 18.07 beta 1.
Andrew
thanks Andhrew. Yes you are right 18.04 seems have bug with PVC bound . I have followed your solution to use 18.07 beta 1 and it worked without any major issue in openshift container platform as a storage class and i was able to create PV and bound it to PVC ?
thanks.
Hi john, i tried with ISCSI and it didn't work as i found its bug as stated by Andrew on build 18.04 ? so 18.07 beta 1 worked for all NAS and SAN traffic ( NFS, ISCSI) .
Anyways, thanks for your suggestion.
we followed the installations steps described at https://netapp-trident.readthedocs.io/en/stable-v18.04/kubernetes/deploying.html#download-extract-the-installer
we starting the installation on the master.
while the the container is running we got the following output inside the container
the events regarding the namespace are