OpenVidu / openvidu

OpenVidu Platform main repository
https://openvidu.io
Apache License 2.0
1.84k stars 463 forks source link

RPCNotificationService exception with closed ws #459

Open techcouncilbox opened 4 years ago

techcouncilbox commented 4 years ago

Describe the bug Possible Bug is described in https://openvidu.discourse.group/t/rpcnotificationservice-exception-with-closed-ws/623/6

Logs are attached in thread

Sorry for the spam but maybe it is better place to check that.

Expected behavior All rpc calls should be called with opened rpc websocket.

Wrong current behavior RPCNotificationService exception with closed ws when sending messages.

OpenVidu tutorial where to replicate the error

Client device info (if applicable) Chrome 81 with PC (windows, Mac, Unix)

Screenshots

Additional context

techcouncilbox commented 4 years ago

It looks like 2 threads are triying to perform operations and obtain unconsistent info (highlighted in orange when second thread is disconnecting and after that error arises).

image

Maybe concurreny in these variables in KurentoParticipant need to be reviewed :

image

And Autowired variables marked as volatile in RPCNotificationService:

image

And SessionManager:

image

But we don't understand enough the source code to propose a valid solution

viktara commented 4 years ago

We're facing constant session crashes for users with this error on the latest release of openvidu. After a restart everything seems to work ok for a day or so and then users meetings start failing with the above error being thrown on the server. We're restarting daily at the moment. Any pointers on how to debug this would be appreciated.

micaelgallego commented 4 years ago

@viktara can you describe your problem in more detail? Can you share an entire OpenVidu server log?

syam-tbl commented 4 years ago

Hi @micaelgallego , Here is the link to more detail logs https://cloud.techblue.co.uk/s/8PLdi1bP1WllqGb/download Kindly let us know if you need any further information

viktara commented 4 years ago

@micaelgallego we're still experiencing multiple user session crashes each day. The only work around so far is to restart the containers and the problem goes away for a while. any pointers on where to debug would be appreciated. we're digging deeper and profiling at the moment, but are surprised that this issue is not effecting anyone else or in fact your pro customers. are you aware of any other occurrences of this issue or can you give us any further pointers here?

pabloFuente commented 4 years ago

Hi,

We are not experiencig this kind of issues. I can't think of anybody with this same problem. How is your CPU load behavior? If your system is overloaded this kind of issues may appear.

viktara commented 4 years ago

Everything on that front seems fine, no cpu spikes, tons of unused memory. The instance is big and we have a small number of users, so I don't think it can be related to that.

A long shot, but we're going to disable the session garbage collector just in case that is causing an issue.

viktara commented 4 years ago

We have set up a new instance exactly as per the deploymet guide and started load testing using KITE. We get the same issue with the new instance, the session disconnects all users and throws the same error on the server. There is very little load on the system and it seems to happen even with very few users in a conference. We're creating a screencast in case you want to have a look. Could you help us with some pointers of where to look next?

viktara commented 4 years ago

@syam-tbl confirmed that we are not hitting 200 threads, so this issue does not seem linked to #476

viktara commented 4 years ago

@micaelgallego @pabloFuente we upgraded to the latest release and ran load tests again, but the error still occurs

micaelgallego commented 4 years ago

Thank you for reporting. We will take a look to see what happens

pabloFuente commented 4 years ago

Let's sort out a bit the problems described here.

This issue was started by @techcouncilbox refering a post message titled "RPCNotificationService exception with closed ws".

The suggestion made by this user was some kind of concurrency problem, but this doesn't really make sense. This message/exception appears when some client unexpectedly disconnects its websocket from OpenVidu Server. If this happens and OpenVidu is in the middle of notifying users about some event of the session, then it will throw this error. So the error is not a problem per se, it simply informs about the impossibility of sending a message to a closed client.

Then @viktara posted a new message talking about this same error. We discovered a possible bug that was affecting the amount of threads available for Tomcat in OpenVidu Server, but this seems not to be the real problem here according to https://github.com/OpenVidu/openvidu/issues/459#issuecomment-654159627.

So, from this point we need a way of replicating @viktara issue. Please, can you tell us how is your exact deployment? Did you deployed in AWS or On Premises? If it is the last one, release 2.15.0 brings a beautiful command that you can execute in your OpenVidu installation folder (default to /opt/openvidu) to get that information:

./openvidu version

Also, if you are using a particular Cloud Provider to deploy OpenVidu On Premises, tell us which.

After having all of your deployment information, we need a way of replicating the problem using an application. WHen do you get the error? Connecting 5,6,7.... users at a time all of them publishing and subscribing to each other? What is th architecture of your sessions?

Regards.

viktara commented 4 years ago

@pabloFuente Yeah - it seems that the error is as you say and doesn't give us much information about why the video sessions die. As for the environment, we first deployed on aws and then tried another set up on digital ocean.

We have run load tests against both 2.14 and 2.15 and see the same behaviour in both and are using the openvidu-call application as our front end. I guess the problem could arise there. @syam-tbl is running the load tests and has more information so I'll leave him to pitch in here too.

syam-tbl commented 4 years ago

@pabloFuente as @viktara mentioned we tested

We were using KITE for running the automated test. https://github.com/CoSMoSoftware/KITE-Sample-Tests/tree/master/KITE-OpenVidu-Test & yes during the test all users are publishing both voice and video

viktara commented 4 years ago

@syam-tbl can you execute ./openvidu version as suggested by @pabloFuente too please?

syam-tbl commented 4 years ago

Sure, here is the output of that one

$ ./openvidu version

Openvidu Information:

  Installation Type: On Premises
  Openvidu Version: 2.15.0
  Openvidu Call Version: 2.15.0

System Information:

  Linux Version:
    - Description:      Ubuntu 18.04.3 LTS
  Docker Version: 19.03.1
  Docker Compose Version: 1.26.0

Installation Information:

  Installation Folder: /opt/openvidu
  Installation Folder Tree:

         |-docker-compose.override.yml
         |-openvidu
         |-docker-compose.yml
         |-.env
         |-readme.md
         |-owncert
         | |-certificate.key
         | |-certificate.cert

Docker Running Services:

  CONTAINER ID        IMAGE                                 COMMAND                  CREATED             STATUS                  PORTS                    NAMES
        fc9e99d30dc5        openvidu/openvidu-coturn:1.0.0        "/bin/sh -c /usr/loc_"   47 hours ago        Up 47 hours                                      openvidu_coturn_1
        c711e9417269        redis:5.0.7                           "docker-entrypoint.s_"   47 hours ago        Up 47 hours                                      openvidu_redis_1
        a6aac117e87b        openvidu/openvidu-call:2.15.0         "docker-entrypoint.s_"   47 hours ago        Up 47 hours             0.0.0.0:5442->5000/tcp   openvidu_app_1
        04b07d9aefc3        openvidu/openvidu-server:2.15.0       "/bin/bash -c 'expor_"   47 hours ago        Up 47 hours                                      openvidu_openvidu-server_1
        ba91866204e2        kurento/kurento-media-server:6.14.0   "/entrypoint.sh"         47 hours ago        Up 47 hours (healthy)                            openvidu_kms_1
        83f26c02a47f        openvidu/openvidu-proxy:2.0.0         "/bin/sh -c /usr/loc_"   47 hours ago        Up 47 hours                                      openvidu_nginx_1