Closed AakritiTalwar12 closed 4 years ago
Thanks for track the issue. Could you share the logs on both server and client during the connectivity? @AakritiTalwar12
PLs find the screenshots for logs below
@AakritiTalwar12 can you share the server script? The logs are showing undefined gateway name I guess some error is there in hst flag
@AakritiTalwar12 I also checked the gateway health there is no superconnection
In my discussion w/ Gayatri, it seems the superconn does exist in this vlan deployment.
In the use case with the vlan, we need to ensure the ip/port relevant to the target machine is avaialble. Please verify if the address 3.34.218.30:1612
is reachable from within the host where the server agent is deployed. This is confirmed reachable by Gayatri.
Secondly, please share the gateway logs file where it shows the broken connectivity for the further investigation.
I did both telnet and ping from the CCL server and got the below result
Vinay, Gayatri, and I had a quick session over the connectivity. Our observation as follows-
1) tenet 3.34.218.30 1621
connection was established successfully.
2) The connection between agents was established successfully with a dedicated session# and a #binding. However, the connection between the server and the resource (oracle db) was terminated by thw server agent after idling for 2-3 mins.
3) the gateway received the fin (EOF signal) from the server, and closed the wsocket connection between the server/client agents.
4) client agent received the fin (EOF) from the gateway, hence it terminated the TCP connection to the requester (nifi processor)
Recommended steps: 1) validate the transaction in the processor as of why it would take more than 2-3 min to process, and led to the eventual timeout. 2) create a "working" copy of a nifi flow in which successfully using EC VLAN plugin for the oracle SCAN db. This should include the same oracle driver/server configuration/setting. 3) create a EC group, with a new pair of agent ids. 4) test the VLAN connectivity with the new group, ids, and the copied flow. Identify the differences towards a potential fix.
We will sync up again in the evening IST. @Gayatri212 @vrp0000 @AakritiTalwar12
We tried using a dedicated EC agent. Below are the screenshots of the error received. 1) Client agent
2) Server agent
3)Gateway logs
4) Error on the processor
5) Processor Configuration of a similar setup which is working fine using VLAN
After a quick call over the latest logs, we understand the following steps were taken-
1) replicate an existing nifi flow which runs the VLAN plugin successful. 2) use a new pair of groupid, agent ids.
here is some suggestion-
1) upgrade agent from #209 to #212 to avoid a VLAN bug which would potentially timeout the connectivity. 2) For the test, use the same oracle db instance which has been running successfully with the plugin.
pfb the screenshot of server logs for the request
We had another session. Below setting are known during the session-
1) the agent is #212 2) a new group id/agent ids were used.
Observation: 1) The requester signaled a timeout to the client agent following a lengthy wait time. 2) the server received a FIN from the client, and terminate the resource connection. 3) From the screenshot provided by @AakritiTalwar12, the log indicated that the session was established properly between the client/server agents. The agent(s) appear doing the job.
Next steps: 1) verify the status of the oracle db; ensure the db is up and running. 2) identify the range of IPs and Port(s) used for the oracle db, all relavant to the following server host-
/data/ec/resources/ibs-fdm-test
TC DEV server
s201-2-scan.xxx.xxx.xxx:1621
FDM_RO
Update-
The Oracle db is up and running. As direct connectivity from on-prim is working.
The IP and port range is not yet received from developer.
We tried to connect to this db using sqlcl through EC connection and following are the logs for it
root@ip-10-227-84-46:/home/ubuntu# /data/ec/resources/oracle-database-dev/sqlcl/bin/sql password/FDM_RO@s201-2-scan.cloud.ge.com:1621/tlfdmq_taf.cloud.ge.com
SQLcl: Release 18.2 Production on Tue Dec 10 11:22:40 2019
Copyright (c) 1982, 2019, Oracle. All rights reserved.
USER = S3cr3t_W0rd URL = jdbc:oracle:thin:@s201-2-scan.cloud.ge.com:1621/tlfdmq_taf.cloud.ge.com Error Message = IO Error: The Network Adapter could not establish the connection
Thnk for the update, On the item#3, can we observe the connectivity with the SqlCL If the SQLCL is deployed on the machine where the server agent is deployed, (no EC connectivity) What is the outcome? @Gayatri212 @AakritiTalwar12 @Kshitij
Thnk for the update, On the item#3, can we observe the connectivity with the SqlCL If the SQLCL is deployed on the machine where the server agent is deployed, (no EC connectivity) What is the outcome? @Gayatri212 @AakritiTalwar12 @kshitij
Per our latest sesssion (Simran, Kshitij, Gayatri, Chia) here summarised our observation-
1) SqlCl command is not properly setup on the machine where the server agent is deployed, Kshitij will share the screenshot/logs. 2) We were able to identify several flows that are working with the VLAN plugin for OracleDB. E.g. smartshop, TCT for GRC/RPLM, etc 3) Although we have a status of the target OracleDB, so far we still cannot prove the connectivity between the host (alpcclappdvn01.corporate.ge.com, which the server agent is deployed) and the actual oracledb a working environment. The challenge is the team has no control over the network topology relevant to the oracledb, hence it's uncertain it will work with the server agent.
Todo:
1) Reach out to the team who owns the oracledb, and its network setting(s), LBer, IP range, etc. to find out the actual configuration/compatibility. 2) Test the direct connectivity between the server agent host and the target oracledb 3) Verify the setting/configuration/connection string in the nifi flow of this PoC
Thnk for the update, On the item#3, can we observe the connectivity with the SqlCL If the SQLCL is deployed on the machine where the server agent is deployed, (no EC connectivity) What is the outcome? @Gayatri212 @AakritiTalwar12 @kshitij
Per our latest sesssion (Simran, Kshitij, Gayatri, Chia) here summarised our observation-
- SqlCl command is not properly setup on the machine where the server agent is deployed, Kshitij will share the screenshot/logs.
- We were able to identify several flows that are working with the VLAN plugin for OracleDB. E.g. smartshop, TCT for GRC/RPLM, etc
- Although we have a status of the target OracleDB, so far we still cannot prove the connectivity between the host (alpcclappdvn01.corporate.ge.com, which the server agent is deployed) and the actual oracledb a working environment. The challenge is the team has no control over the network topology relevant to the oracledb, hence it's uncertain it will work with the server agent.
Todo:
- Reach out to the team who owns the oracledb, and its network setting(s), LBer, IP range, etc. to find out the actual configuration/compatibility.
- Test the direct connectivity between the server agent host and the target oracledb
- Verify the setting/configuration/connection string in the nifi flow of this PoC
Please find the below Error for the sqlcl running on server agent:
Update -
Observations -
Suggestion -
- There is no issue with oracle db and the default port is 1621 only - confirmed by db team.
The DB is deployed via a custom environment where the network settings are not visible to this troubleshooting. Any observation of the direct access between the server agent host and the Oracle DB inst @kshitij @502762963 @Gayatri212 @AakritiTalwar12
Update -
Any observation of the direct access between the server agent host and the Oracle DB inst
On this, we were able to connect to this Oracle DB instance from the VM where EC server is running please find the logs below.
Observation To make this connection work we need to do some TNS_ADMIN specific configuration on VM.
Next Step @Simran will be connecting with the oracle person again to get exact configuration done on EC client side as well to connect with Oracle db instance.
Close the issue as 1) the oracle db is not visible to this use case. 2) Lack of clarity from the issue tracker(s). 3) The PoC tentatively moved forward without EC. Will re-open if needed.
@503025235 user comments- /*****/ The FDM POC TC issue is resolved now. Actions taken to resolve the issue are as below
1) The Host IP and port provided were not appropriate as those were the load balancer IP and port and not the actual IP and Port we use to connect to the database directly. Due to this the connectivity glitch was happening. The actual IP and port are IP - 3.34.218.26 Port - 1624. 2) The Oracle driver class name was changed to oracle.jdbc.OracleDriver. 3) The oracle jdbc jar of version 8 was used. 4) "oracle.jdbc.timezoneAsRegion" property was added with the value as "False".
All these resulted to the issue resolution.
We are trying to connect to an onprem Oracle DB using a common setup used for many oracle connections.While connecting its giving an "EOF while reading from the gateway" and closed network connection.
Steps taken till Now:
As seen till now the target server is rejecting the connection.