EC-Release / sdk

The Agent SDK
Other
4 stars 7 forks source link

FDM POC EC connection Issue #25

Closed AakritiTalwar12 closed 4 years ago

AakritiTalwar12 commented 4 years ago

We are trying to connect to an onprem Oracle DB using a common setup used for many oracle connections.While connecting its giving an "EOF while reading from the gateway" and closed network connection.

Steps taken till Now:

  1. Checked the VLAN plugin and it is working fine.
  2. The target IPs are being added to the local loopback

As seen till now the target server is rejecting the connection.

ayasuda2OO3 commented 4 years ago

Thanks for track the issue. Could you share the logs on both server and client during the connectivity? @AakritiTalwar12

AakritiTalwar12 commented 4 years ago

PLs find the screenshots for logs below Screenshot (9)

Screenshot (8) Screenshot (7)

Gayatri212 commented 4 years ago

@AakritiTalwar12 can you share the server script? The logs are showing undefined gateway name I guess some error is there in hst flag

Gayatri212 commented 4 years ago

@AakritiTalwar12 I also checked the gateway health there is no superconnection

ayasuda2OO3 commented 4 years ago

In my discussion w/ Gayatri, it seems the superconn does exist in this vlan deployment.

In the use case with the vlan, we need to ensure the ip/port relevant to the target machine is avaialble. Please verify if the address 3.34.218.30:1612 is reachable from within the host where the server agent is deployed. This is confirmed reachable by Gayatri.

Secondly, please share the gateway logs file where it shows the broken connectivity for the further investigation.

vrp0000 commented 4 years ago

I did both telnet and ping from the CCL server and got the below result

  1. ping 3.34.218.30 Got response
  2. telnet 3.34.218.30 1621 Got connection refused
  3. telnet 3.34.218.30 Got connection refused image
ayasuda2OO3 commented 4 years ago

Vinay, Gayatri, and I had a quick session over the connectivity. Our observation as follows-

1) tenet 3.34.218.30 1621 connection was established successfully. 2) The connection between agents was established successfully with a dedicated session# and a #binding. However, the connection between the server and the resource (oracle db) was terminated by thw server agent after idling for 2-3 mins. 3) the gateway received the fin (EOF signal) from the server, and closed the wsocket connection between the server/client agents. 4) client agent received the fin (EOF) from the gateway, hence it terminated the TCP connection to the requester (nifi processor)

Recommended steps: 1) validate the transaction in the processor as of why it would take more than 2-3 min to process, and led to the eventual timeout. 2) create a "working" copy of a nifi flow in which successfully using EC VLAN plugin for the oracle SCAN db. This should include the same oracle driver/server configuration/setting. 3) create a EC group, with a new pair of agent ids. 4) test the VLAN connectivity with the new group, ids, and the copied flow. Identify the differences towards a potential fix.

We will sync up again in the evening IST. @Gayatri212 @vrp0000 @AakritiTalwar12

vrp0000 commented 4 years ago

We tried using a dedicated EC agent. Below are the screenshots of the error received. 1) Client agent image

2) Server agent image

3)Gateway logs image

4) Error on the processor image

5) Processor Configuration of a similar setup which is working fine using VLAN image

ayasuda2OO3 commented 4 years ago

After a quick call over the latest logs, we understand the following steps were taken-

1) replicate an existing nifi flow which runs the VLAN plugin successful. 2) use a new pair of groupid, agent ids.

here is some suggestion-

1) upgrade agent from #209 to #212 to avoid a VLAN bug which would potentially timeout the connectivity. 2) For the test, use the same oracle db instance which has been running successfully with the plugin.

AakritiTalwar12 commented 4 years ago

pfb the screenshot of server logs for the request

image

ayasuda2OO3 commented 4 years ago

We had another session. Below setting are known during the session-

1) the agent is #212 2) a new group id/agent ids were used.

Observation: 1) The requester signaled a timeout to the client agent following a lengthy wait time. 2) the server received a FIN from the client, and terminate the resource connection. 3) From the screenshot provided by @AakritiTalwar12, the log indicated that the session was established properly between the client/server agents. The agent(s) appear doing the job.

Next steps: 1) verify the status of the oracle db; ensure the db is up and running. 2) identify the range of IPs and Port(s) used for the oracle db, all relavant to the following server host-

/data/ec/resources/ibs-fdm-test
TC DEV server
s201-2-scan.xxx.xxx.xxx:1621
FDM_RO
Gayatri212 commented 4 years ago

Update-

  1. The Oracle db is up and running. As direct connectivity from on-prim is working.

  2. The IP and port range is not yet received from developer.

  3. We tried to connect to this db using sqlcl through EC connection and following are the logs for it

    
    root@ip-10-227-84-46:/home/ubuntu# /data/ec/resources/oracle-database-dev/sqlcl/bin/sql password/FDM_RO@s201-2-scan.cloud.ge.com:1621/tlfdmq_taf.cloud.ge.com

SQLcl: Release 18.2 Production on Tue Dec 10 11:22:40 2019

Copyright (c) 1982, 2019, Oracle. All rights reserved.

USER = S3cr3t_W0rd URL = jdbc:oracle:thin:@s201-2-scan.cloud.ge.com:1621/tlfdmq_taf.cloud.ge.com Error Message = IO Error: The Network Adapter could not establish the connection

ayasuda2OO3 commented 4 years ago

Thnk for the update, On the item#3, can we observe the connectivity with the SqlCL If the SQLCL is deployed on the machine where the server agent is deployed, (no EC connectivity) What is the outcome? @Gayatri212 @AakritiTalwar12 @Kshitij

ayasuda2OO3 commented 4 years ago

Thnk for the update, On the item#3, can we observe the connectivity with the SqlCL If the SQLCL is deployed on the machine where the server agent is deployed, (no EC connectivity) What is the outcome? @Gayatri212 @AakritiTalwar12 @kshitij

Per our latest sesssion (Simran, Kshitij, Gayatri, Chia) here summarised our observation-

1) SqlCl command is not properly setup on the machine where the server agent is deployed, Kshitij will share the screenshot/logs. 2) We were able to identify several flows that are working with the VLAN plugin for OracleDB. E.g. smartshop, TCT for GRC/RPLM, etc 3) Although we have a status of the target OracleDB, so far we still cannot prove the connectivity between the host (alpcclappdvn01.corporate.ge.com, which the server agent is deployed) and the actual oracledb a working environment. The challenge is the team has no control over the network topology relevant to the oracledb, hence it's uncertain it will work with the server agent.

Todo:

1) Reach out to the team who owns the oracledb, and its network setting(s), LBer, IP range, etc. to find out the actual configuration/compatibility. 2) Test the direct connectivity between the server agent host and the target oracledb 3) Verify the setting/configuration/connection string in the nifi flow of this PoC

502762963 commented 4 years ago

Thnk for the update, On the item#3, can we observe the connectivity with the SqlCL If the SQLCL is deployed on the machine where the server agent is deployed, (no EC connectivity) What is the outcome? @Gayatri212 @AakritiTalwar12 @kshitij

Per our latest sesssion (Simran, Kshitij, Gayatri, Chia) here summarised our observation-

  1. SqlCl command is not properly setup on the machine where the server agent is deployed, Kshitij will share the screenshot/logs.
  2. We were able to identify several flows that are working with the VLAN plugin for OracleDB. E.g. smartshop, TCT for GRC/RPLM, etc
  3. Although we have a status of the target OracleDB, so far we still cannot prove the connectivity between the host (alpcclappdvn01.corporate.ge.com, which the server agent is deployed) and the actual oracledb a working environment. The challenge is the team has no control over the network topology relevant to the oracledb, hence it's uncertain it will work with the server agent.

Todo:

  1. Reach out to the team who owns the oracledb, and its network setting(s), LBer, IP range, etc. to find out the actual configuration/compatibility.
  2. Test the direct connectivity between the server agent host and the target oracledb
  3. Verify the setting/configuration/connection string in the nifi flow of this PoC

Please find the below Error for the sqlcl running on server agent:

/u03/shared/Sarah/sqlcl/bin # sql FDM_RO@s201-2-scan.cloud.ge.com:1621/tlfdmq_taf.cloud.ge.com SQLcl: Release 19.2.1 Production on Tue Dec 10 13:49:06 2019 Copyright (c) 1982, 2019, Oracle. All rights reserved. Password? (**********?) *********** ERROR: **IO Error: System Property oracle.net.tns_admin was empty.**
Gayatri212 commented 4 years ago

Update -

  1. There is no issue with oracle db and the default port is 1621 only - confirmed by db team.
  2. We found one flow which uses VLAN setup to connect with Oracle db and which is working fine so we compared this setup with the working setup.
  3. As the other setup is working fine with the same VLAN connection proves that there is no issue with EC setup.

Observations -

Suggestion -

ayasuda2OO3 commented 4 years ago
  1. There is no issue with oracle db and the default port is 1621 only - confirmed by db team.

The DB is deployed via a custom environment where the network settings are not visible to this troubleshooting. Any observation of the direct access between the server agent host and the Oracle DB inst @kshitij @502762963 @Gayatri212 @AakritiTalwar12

Gayatri212 commented 4 years ago

Update -

Any observation of the direct access between the server agent host and the Oracle DB inst

On this, we were able to connect to this Oracle DB instance from the VM where EC server is running please find the logs below.

D0FE85E8

ayasuda2OO3 commented 4 years ago

Close the issue as 1) the oracle db is not visible to this use case. 2) Lack of clarity from the issue tracker(s). 3) The PoC tentatively moved forward without EC. Will re-open if needed.

ayasuda2OO3 commented 4 years ago

@503025235 user comments- /*****/ The FDM POC TC issue is resolved now. Actions taken to resolve the issue are as below

1) The Host IP and port provided were not appropriate as those were the load balancer IP and port and not the actual IP and Port we use to connect to the database directly. Due to this the connectivity glitch was happening. The actual IP and port are IP - 3.34.218.26 Port - 1624. 2) The Oracle driver class name was changed to oracle.jdbc.OracleDriver. 3) The oracle jdbc jar of version 8 was used. 4) "oracle.jdbc.timezoneAsRegion" property was added with the value as "False".

All these resulted to the issue resolution.

image