CESNET / netopeer2

NETCONF toolset
BSD 3-Clause "New" or "Revised" License
300 stars 189 forks source link

edit-data takes longer to execute from netopeer2-cli #1644

Open nagraj-vs opened 1 month ago

nagraj-vs commented 1 month ago

Hi, i have connected to netopeer2-server from netopeer2-cli via connect --ssh on a Linux x86_64 machine. edit-data takes longer to complete around 178ms,

when i checked the verbose output, ssh_socket_unbuffered_write call looks to be taking more time. I dont know if its an issue with the libssh or openssh or what could be the reason here?

below is the output from CLI.

edit-data --datastore running --config=/tmp/t.xml --defop merge nc DEBUG: Sending message:

714

[2024/08/29 05:26:37.304941, 3] ssh_packet_need_rekey: rekey: [data_rekey_needed=0, out_blocks=1681, in_blocks=49799] [2024/08/29 05:26:37.305018, 3] ssh_socket_unbuffered_write: Enabling POLLOUT for socket [2024/08/29 05:26:37.305029, 3] packet_send2: packet: wrote [type=94, len=24, padding_size=8, comp=15, payload=15] [2024/08/29 05:26:37.305036, 3] channel_write_common: ssh_channel_write wrote 6 bytes nc DEBUG: Sending message:

ds:runningmerge [2024/08/29 05:26:37.305068, 3] ssh_packet_need_rekey: rekey: [data_rekey_needed=0, out_blocks=1771, in_blocks=49888] [2024/08/29 05:26:37.305085, 3] packet_send2: packet: wrote [type=94, len=728, padding_size=4, comp=723, payload=723] [2024/08/29 05:26:37.305093, 3] channel_write_common: ssh_channel_write wrote 714 bytes [2024/08/29 05:26:37.305122, 3] ssh_socket_unbuffered_write: Enabling POLLOUT for socket nc DEBUG: Sending message: ## [2024/08/29 05:26:37.305135, 3] ssh_packet_need_rekey: rekey: [data_rekey_needed=0, out_blocks=1772, in_blocks=49799] [2024/08/29 05:26:37.305158, 3] packet_send2: packet: wrote [type=94, len=24, padding_size=10, comp=13, payload=13] [2024/08/29 05:26:37.305165, 3] channel_write_common: ssh_channel_write wrote 4 bytes **[2024/08/29 05:26:37.305192, 3] ssh_socket_unbuffered_write: Enabling POLLOUT for socket** [2024/08/29 05:26:37.479977, 3] ssh_packet_socket_callback: packet: read type 94 [len=24,padding=9,comp=14,payload=14] [2024/08/29 05:26:37.479995, 3] ssh_packet_process: Dispatching handler for packet type 94 [2024/08/29 05:26:37.480008, 3] channel_rcv_data: Channel receiving 5 bytes data in 0 (local win=889637 remote win=1266644) [2024/08/29 05:26:37.480016, 3] channel_default_bufferize: placing 5 bytes into channel buffer (stdout) [2024/08/29 05:26:37.480022, 3] channel_rcv_data: Channel windows are now (local win=889632 remote win=1266644) [2024/08/29 05:26:37.480028, 3] ssh_packet_need_rekey: rekey: [data_rekey_needed=0, out_blocks=1772, in_blocks=49799] [2024/08/29 05:26:37.480038, 3] ssh_channel_read_timeout: Read (2) buffered : 5 bytes. Window: 889632 [2024/08/29 05:26:37.480045, 3] ssh_channel_read_timeout: Read (1) buffered : 3 bytes. Window: 889632 [2024/08/29 05:26:37.480051, 3] ssh_channel_read_timeout: Read (1) buffered : 2 bytes. Window: 889632 [2024/08/29 05:26:37.480056, 3] ssh_channel_read_timeout: Read (1) buffered : 1 bytes. Window: 889632 [2024/08/29 05:26:37.480065, 3] ssh_channel_read_timeout: Read (92) buffered : 0 bytes. Window: 889632 [2024/08/29 05:26:37.480231, 3] ssh_channel_read_timeout: Read (92) buffered : 0 bytes. Window: 889632 [2024/08/29 05:26:37.480270, 3] ssh_packet_socket_callback: packet: read type 94 [len=112,padding=10,comp=101,payload=101] [2024/08/29 05:26:37.480281, 3] ssh_packet_process: Dispatching handler for packet type 94 [2024/08/29 05:26:37.480288, 3] channel_rcv_data: Channel receiving 92 bytes data in 0 (local win=889632 remote win=1266644) [2024/08/29 05:26:37.480293, 3] channel_default_bufferize: placing 92 bytes into channel buffer (stdout) [2024/08/29 05:26:37.480299, 3] channel_rcv_data: Channel windows are now (local win=889540 remote win=1266644) [2024/08/29 05:26:37.480304, 3] ssh_packet_socket_callback: Processing 44 bytes left in socket buffer [2024/08/29 05:26:37.480313, 3] ssh_packet_socket_callback: packet: read type 94 [len=24,padding=10,comp=13,payload=13] [2024/08/29 05:26:37.480325, 3] ssh_packet_process: Dispatching handler for packet type 94 [2024/08/29 05:26:37.480332, 3] channel_rcv_data: Channel receiving 4 bytes data in 0 (local win=889540 remote win=1266644) [2024/08/29 05:26:37.480338, 3] channel_default_bufferize: placing 4 bytes into channel buffer (stdout) [2024/08/29 05:26:37.480346, 3] channel_rcv_data: Channel windows are now (local win=889536 remote win=1266644) [2024/08/29 05:26:37.480355, 3] ssh_packet_need_rekey: rekey: [data_rekey_needed=0, out_blocks=1772, in_blocks=49812] [2024/08/29 05:26:37.480365, 3] ssh_packet_need_rekey: rekey: [data_rekey_needed=0, out_blocks=1772, in_blocks=49812] [2024/08/29 05:26:37.480373, 3] ssh_channel_read_timeout: Read (2) buffered : 4 bytes. Window: 889536 [2024/08/29 05:26:37.480381, 3] ssh_channel_read_timeout: Read (1) buffered : 2 bytes. Window: 889536 [2024/08/29 05:26:37.480388, 3] ssh_channel_read_timeout: Read (1) buffered : 1 bytes. Window: 889536 nc DEBUG: Received message: OK >
michalvasko commented 1 month ago

I do not understand the issue, you think it takes too long? Compared to some other client? It depends on the exact edit, with lots of data the operation is bound to take longer.

nagraj-vs commented 1 month ago

on a older linux, it takes quite faster , the response from netopeer2-server is seen with in 1ms to 40ms . where as in new linux , cli is taking longer to send data to server, as from above traces in netopeer-cli, i could see that ssh_socket_unbuffered_write is taking longer, which means either the underlying kernel is taking time to send the data to server?

from the trace, it seems the 714 bytes of data is sent from netopeer2-cli to server immediately followed by 4 bytes where the delay is seen. may be that writing 4 bytes to socket is taking long time.

**[2024/08/29 05:26:37.305122, 3] ssh_socket_unbuffered_write: Enabling POLLOUT for socket

[2024/08/29 05:26:37.305165, 3] channel_write_common: ssh_channel_write wrote 4 bytes [2024/08/29 05:26:37.305192, 3] ssh_socket_unbuffered_write: Enabling POLLOUT for socket**

I am trying to analayse, what could be the possible casue for the delay in socket communication, if its happening due to kernel or something else like openssh library ?

michalvasko commented 1 month ago

You are free to experiment further and try to find the cause but I am not able to help or assist you, I do not know anything about this.

nagraj-vs commented 1 month ago

i checked that the delay is caused from below code while starting polling in netopeer2-server code:

netopeer2/src/config.h.in:62:#define NP2SRV_PS_BACKOFF_SLEEP 200

/ listen for incoming requests on active NETCONF sessions / rc = nc_ps_poll(np2srv.nc_ps, NP2SRV_POLL_IO_TIMEOUT, &ncs);

    if ((rc & (NC_PSPOLL_NOSESSIONS | NC_PSPOLL_TIMEOUT | NC_PSPOLL_ERROR)) && !(rc & NC_PSPOLL_SESSION_TERM)) {
        /* if there is no active session, timeout, or an error, rest for a while */
        np_sleep(NP2SRV_PS_BACKOFF_SLEEP);
        continue;
    }

could you please tell me if you know if there is any reason for keeping a sleep of 200 ms during the polling . what are the side effects if we reduce the sleep to minimum say some 50ms or lesser.

michalvasko commented 1 month ago

what are the side effects if we reduce the sleep to minimum say some 50ms

Probably none when I think about it. You can test 50 ms or even 20 ms and let me know if everything works fine. If so, I will test it myself and commit it.

nagraj-vs commented 1 month ago

i changed the NP2SRV_PS_BACKOFF_SLEEP to 50 ms and verified the CPU usage for netopeer2-server in my docker environment and noticed the CPU/memory usage is not increased as per the below output.

image

image

could you please do the necessary changes and confirm if you can commit below changes so that we can apply this patch in our project.

src/config.h.in

/** @brief Timeout for PS structure accessing in

michalvasko commented 1 month ago

Thanks for that but you can also test with less to find the optimal value, it will likely be even less. The sleep has been added a long time ago, probably when it all worked a bit differently because now there is also another "sleep", when nc_ps_poll() waits for events on the sessions, using poll(), which does not use any resources. So the point of this timeout is to let go of the ps lock so that other threads can access it and I would expect 10 ms is more than enough for that.

nagraj-vs commented 1 month ago

I see that the CPU usage is doubled to 1.7% with 10 ms and 20 ms. image

with 30 ms, the CPU usage mostly stays between 0.3 and 0.7 and 1.3 , but sometimes reaches 1.0 and comes back to 0.3/0.7 on an average

image

image

for the optimal performance, either 50ms or 30ms is better. Please check from your tests what is suitable and let me know

michalvasko commented 1 month ago

Alright, thanks, 30 ms it is.

nagraj-vs commented 1 month ago

okay i see that you have committed this change already in devel branch. may I know when you are planning to release the tag. so that I can ask my team to apply this patch.

michalvasko commented 1 month ago

It has already been committed, the commit is referenced above my previous post.

nagraj-vs commented 1 month ago

may I know when you are planning to release the tag for devel branch right , would you also commit this change to master branch.

michalvasko commented 1 month ago

In approximately 2 months.

nokia-mkd commented 1 month ago

Hi @michalvasko , Thank you so much for your support on this. I thought it would be more useful if this value could be passed as an argument while starting the netopeer2-server so that based on system configuration and need.

michalvasko commented 1 month ago

What exactly should affect the length of the back-off sleep? What system configuration?

nokia-mkd commented 1 month ago

If any request is not critical in terms of time, someone might look to keep it 200ms only. Not exactly system configuration as such but need of the use case design. Any such flexibility makes debugging also easier.

michalvasko commented 1 month ago

If any request is not critical in terms of time, someone might look to keep it 200ms only.

If there are no adverse effects of it being 30 ms, I see no reason for someone having it 200 ms. For debugging, you can manually change ti to whatever you want but I am not sure what exactly does this sleep affect regarding debugging.