DeviceFarmer / stf

Control and manage Android devices from your browser.
https://devicefarmer.github.io
Other
3.32k stars 479 forks source link

parallel `**POST** /user/devices/{serial}` should success only once #620

Closed jupe closed 3 months ago

jupe commented 1 year ago

What is the issue or idea you have?

Parallel identical api call for **POST** /user/devices/{serial} leads success for both requests. Expectation is that another would fail with 403: Device is being used or not available even same account is calling it.

Does it only happen on a specific device? Please run adb devices -l and paste the corresponding row. happens with any device

Please provide the steps to reproduce the issue.

used docker image: devicefarmer/stf:3.6.4

call mentioned API twice in parallel. Below are simple python script to reproduce issue

import threading
from os import environ
from time import sleep

from stf_appium_client.StfClient import StfClient

client = StfClient(host=environ.get('STF_HOST'))
client.connect(token=environ.get('STF_TOKEN'))
dev_dict = client.get_devices()[0]

def thread_function(dev_dict):
    while True:
        serial = dev_dict["serial"]
        client.allocate({"serial": serial})
        print(f'{threading.get_ident()}: {dev_dict["serial"]}')
        client.release({"serial": serial})
        sleep(0.1)

threading.Thread(target=thread_function, args=(dev_dict,)).start()
threading.Thread(target=thread_function, args=(dev_dict,)).start()

What is the expected behavior? In our case we are running android tests parallel in CI. Each test job are encapsulated to own container so STF server are only common part of the system. I've developed python libraries (see list below) that makes stf usage easily, but this bug leads that parallel runs takes sometimes same device from stf which leading test failures.

Do you see errors or warnings in the stf local output? If so, please paste them or the full log here.

It looks that group channel subscribing really fails under the hoods:

2022-12-01T10:57:36.872Z IMP/device:plugins:group 588 [D0AA002184J82801202] Now owned by "admin@domain.com"
2022-12-01T10:57:36.874Z INF/device:plugins:group 588 [D0AA002184J82801202] Subscribing to group channel "C/e7cH7XSD2UfOVBzsYVbQ=="
2022-12-01T10:57:36.880Z IMP/device:plugins:group 588 [D0AA002184J82801202] Now owned by "admin@ domain.com"
2022-12-01T10:57:36.881Z INF/device:plugins:group 588 [D0AA002184J82801202] Subscribing to group channel "C/e7cH7XSD2UfOVBzsYVbQ=="
Unhandled rejection Error: Cannot create alias "8LcgFsaRZ7AYeJmsN1p7oEpgntU=" for "C/e7cH7XSD2UfOVBzsYVbQ=="; the channel already exists
    at ChannelManager.register (/app/lib/wire/channelmanager.js:22:13)
    at /app/lib/units/device/plugins/group.js:48:20
From previous event:
    at EventEmitter.plugin.join (/app/lib/units/device/plugins/group.js:42:15)
    at /app/lib/units/device/plugins/group.js:119:27
From previous event:
    at Router.<anonymous> (/app/lib/units/device/plugins/group.js:118:12)
    at Router.emit (/app/node_modules/eventemitter3/index.js:118:35)
    at Router.<anonymous> (/app/lib/wire/router.js:36:12)
    at exports.Socket.emit (node:events:527:28)
    at exports.Socket.Socket._emitMessage (/app/node_modules/zeromq/lib/index.js:649:15)
    at exports.Socket.Socket._flushRead (/app/node_modules/zeromq/lib/index.js:660:10)
    at exports.Socket.Socket._flushReads (/app/node_modules/zeromq/lib/index.js:696:15)
    at Immediate.<anonymous> (/app/node_modules/zeromq/lib/index.js:307:12)
    at processImmediate (node:internal/timers:466:21)
denis99999 commented 1 year ago

@jupe, yes you are right, did you have the same issue with 2 user accounts? I will take a look on it as soon as possible, but in the meantime there is a workaround, for example something like that: once the device is taken you can introduce a random time before proceeding with the "ADB connect [REMOTE_DEBUG_ADDRESS]" and if that gives an error or it returns the message "already connected to [REMOTE_DEBUG_ADDRESS]", you can consider that taking the device was an error, and you can go to the next device to see if it is free?

jupe commented 1 year ago

We did workaround already a bit different way: https://github.com/OpenTMI/stf-appium-python-client/pull/26 which was smallest effort without breaking my library API. Would be great if you have time to look at this more deeply and provide fix!

denis99999 commented 1 year ago

@jupe , it is normally fixed on #650 PR, let me know if that fit your use case ? I did not set a lock for other POST/DELETE API calls related to device object because I don't think it is relevant for now.

jupe commented 1 year ago

I'll test this (latest release) hopefully coming week.