[🐛 Bug]: downloaded drivers are allegedly not executable

joerg1985 commented 1 year ago

What happened?

I have seen in GitActions unit tests failing in the CI with a 'RuntimeException: No driver can be provided for capabilities Capabilities ...', e.g.:

So i created a commit to remove the try/catch bock of isAvailable and replaced it with a loop to better hit the issue, see https://github.com/joerg1985/selenium/commit/829fa6347b579f219795d7abca4cfcdc2d178770

After this i did run the ci pipeline and did see this errors a cause:

Is it possible that concurrently calling DriverFinder.getPath will cause this issue? I guess the Selenium Manager does not ensure that other instances of the Selenium Manager have completed, before passing the driver path back to the DriverFinder.getPath? Or is the executable flag some how set asynchronous to creating the file? This might also be a bug in the JDK and the File.canExecute() is outdated.

This might also happen outside the Selenium GitHub unit tests, so i decided to raise this ticket to start investigations.

How can we reproduce the issue?

Cherry pick https://github.com/joerg1985/selenium/commit/829fa6347b579f219795d7abca4cfcdc2d178770 to reproduce in GitHub Actions.

Relevant log output

See above

Operating System

GitHub Actions

Selenium version

trunk

What are the browser(s) and version(s) where you see this issue?

chrome, firefox

What are the browser driver(s) and version(s) where you see this issue?

latest

Are you using Selenium Grid?

No response

github-actions[bot] commented 1 year ago

@joerg1985, thank you for creating this issue. We will troubleshoot it as soon as we can.

Info for maintainers

Triage this issue by using labels.

If information is missing, add a helpful comment and then I-issue-template label.

If the issue is a question, add the I-question label.

If the issue is valid but there is no time to troubleshoot it, consider adding the help wanted label.

If the issue requires changes or fixes from an external project (e.g., ChromeDriver, GeckoDriver, MSEdgeDriver, W3C), add the applicable G-* label, and it will provide the correct link and auto-close the issue.

After troubleshooting the issue, please add the R-awaiting answer label.

Thank you!

joerg1985 commented 1 year ago

I tried synchronizing the selenium manager calls without success. I also tried to repace the File.canExecute with Files.isExecutable, also no success.

So i hat a look at the javadoc and this might be the effects described in Setting Initial Permissions

titusfortner commented 1 year ago

@diemol Don't the Java tests use the pinned version of browser/drivers rather than Selenium Manager?

It's really hard to know what is happening since we're swallowing the actual error messages. Can we throw the error here just to debug it - https://github.com/SeleniumHQ/selenium/blob/trunk/java/src/org/openqa/selenium/firefox/GeckoDriverInfo.java#L74

joerg1985 commented 1 year ago

@titusfortner i removed the try catch in the commit https://github.com/joerg1985/selenium/commit/829fa6347b579f219795d7abca4cfcdc2d178770 this revealed the ...driver, can not be executed error message.

diemol commented 1 year ago

@titusfortner No, we agreed to use Selenium Manager for tests in Java a while ago.

The isAvailable() is only used in Grid, and in the code we have to prepare the testing environment.

If it is a permissions issue, isn't is something we should look inside Selenium Manager? Does it matter how the binary is invoked?

joerg1985 commented 1 year ago

Look at this run where, i polled until the file gets executable and it is getting executable after some ms.

diemol commented 1 year ago

Would that mean that Selenium Manager answers but the operating system is still completing those file permission tasks?

joerg1985 commented 1 year ago

I think there is no warrenty that changes in file attributes are instantly visible and this might be implementation specific for all the filesystems out there?

joerg1985 commented 1 year ago

We could try to create the file with the correct permissions and not modify them, as far as i understand the rust code :D

https://stackoverflow.com/a/28673836

joerg1985 commented 1 year ago

Just wrote my first lines of rust code to set the permissions while the files are created: https://github.com/joerg1985/selenium/commit/df3ae25105381e5b16cfe3e64b1897f914e73923

Will run the pipeline now several times to see if the issue is gone.

joerg1985 commented 1 year ago

still the same issue: https://github.com/joerg1985/selenium/actions/runs/5693665418/job/15433934188#step:14:194

joerg1985 commented 1 year ago

There is a File.sync_all method, just added it https://github.com/joerg1985/selenium/commit/c052cbccc1702f08d68f30e2a6f5db4985c72e44

CI is running, keep fingers crossed

joerg1985 commented 1 year ago

tried unistd/fn.fsync and unistd/fn.syncfs with no success.

joerg1985 commented 1 year ago

Bazel does run the UnitTests concurrently from different processes, so one process will invoke the download of the selenium manager and another one will pickup the incomplete file without the execution flag.

@titusfortner Do you think it is possible to add some locking to the selenium manager, to avoid picking the incomplete download? Or should i just workaround it for the unit testing? Or should we just ignore these flaky unit tests?

joerg1985 commented 1 year ago

PS: i also noticed Bazel does not show the cause of an exception in the console. this might be helpfull to detect the root cause of other failing tests.

titusfortner commented 1 year ago

This is not the kind of code I'm good at.
I thought there was locking
I thought each process put selenium manager in a different temp directory for this reason
Or are we talking about driver downloads not being completed by one SM before the next tries to use it because it sees something there?
Would it help or hurt this issue to copy the Selenium manager itself to the /.cache/selenium?
Do we want to have a before step in the test suite to load things before they try to be used?

joerg1985 commented 1 year ago

This is not the kind of code I'm good at.

Okay, so we might get others into this.

2. I thought there was locking

I am not aware of the rust code, this must someone else check.

3. I thought each process put selenium manager in a different temp directory for this reason

Yes, but this does not cover this case.

4. Or are we talking about driver downloads not being completed by one SM before the next tries to use it because it sees something there?

Yes, i think this is happening here. One SM is still extracting and another SM picks it up before it is ready (marked executable).

5. Would it help or hurt this issue to copy the Selenium manager itself to the /.cache/selenium?

This would probably make it not better

6. Do we want to have a before step in the test suite to load things before they try to be used?

That's a possible workaround, but others might see this effect too on their CI pipeline.

diemol commented 1 year ago

I need more context on when this is happening.

Does this happen only when we run our tests? Or does this happen when someone runs tests using Selenium as a dependency?

titusfortner commented 1 year ago

It's being reproduced in our tests, but I don't think our tests are using Selenium Manager substantively differently than how others could be using it.

joerg1985 commented 1 year ago

I think this will happen as soon as someone is running tests in parallel on the same system and the selenium manager is downloading something. The first call to the selenium manager will start to download and extract files, while a concurrent call to the selenium manager will pick up the incomplete files and will return the path to the client bingings to use them. In the Selenium CI this seems to happen quite alot when drivers are downloaded at the beginning of the brower / remote tests.

joerg1985 commented 1 year ago

I just noticed the issue again in a recent run, the links above are outdated: https://github.com/SeleniumHQ/selenium/actions/runs/5983650185/job/16234615066#step:14:295

titusfortner commented 1 year ago

Do we know what error we actually get when trying to use a file that has not finished downloading so we can rescue it? We should be checking canExecute or similar before we use it, should we loop on that instead of failing if not true?

We've also implemented this as a singleton, could we toggle a parameter to mark driver downloads in progress and have future method calls block on that?

joerg1985 commented 1 year ago

I think bazel is using multiple processes to run tests in parallel, therefore the singleton does not help to fix

To wait until the driver is executable could be implemented in the selenium manager to have this in one central place. Otherwise it must be implemented for all the client bindings in all languages.

github-actions[bot] commented 10 months ago

This issue is looking for contributors.

Please comment below or reach out to us through our IRC/Slack/Matrix channels if you are interested.

bonigarcia commented 10 months ago

Is this issue still valid? I was reported for an old version of SM, so it may not be certain anymore.

diemol commented 6 months ago

During the May 2024 Selenium Summit, we decided to implement a locking mechanism. It will be low priority until we find a way to reproduce it.

titusfortner commented 5 months ago

Since we don't have a reproducible example for this, but we do for #13511 let's track the fix there.

github-actions[bot] commented 4 months ago

This issue has been automatically locked since there has not been any recent activity since it was closed. Please open a new issue for related bugs.

SeleniumHQ / selenium