Closed adamelhakham closed 6 years ago
@screamerbg We should probably consider this a bug.
The python library fasteners contains interprocess locks to serialize among multiple processes for this purpose. We use them in the mbed-ls platform database.
@adamelhakham Thanks for the awesome and thorough bug report.
@theotherjimmy This is a problem of executing multiple mbed CLI in parallel, not interprocess.
I'll look into solution for this very soon.
@screamerbg
This is a problem of executing multiple mbed CLI in parallel, not interprocess.
That sentence is confusing me. Executing multiple CLIs in parallel is what an interprocess lock would protect against. Maybe you misread that as intraprocess?
@screamerbg Is there an update on this issue? We are suffering terrible performance (and large network utilization) due to lack of cache in our automation tasks.
This should get high priority, this will impact our Jenkins jobs quite heavily, too. We can't use the cache feature at all, until this is resolved.
@adamelhakham Could you try the f/thread_safety
branch on my fork - https://github.com/screamerbg/neo/tree/f/thread_safety ?
@screamerbg Before @adamelhakham tries, could you confirm you managed to reproduce the original problem with his description, and can't reproduce it in the new branch?
@trianglee I can't reproduce it either and that's why I asked @adamelhakham to test it with my fork
@screamerbg I see. Thanks. So let's verify, @adamelhakham.
@screamerbg I ran the script I provided for a few hours and the issue did not appear. with the 1.5 mbed-cli it is reproduced within 10-20 minutes so it seems that the issue could have been successfully fixed. Next week I will integrate your branch into our CI so that we can it better.. I will keep you posted. Thanks!
@adamelhakham Great! Please let me know so we can plan a patch release with this fix.
@screamerbg
We now encounter the following error sometimes:
[mbed] ERROR: Cache lock file exists with a different pid ("12822" vs "12727")
Do you know why?
Thanks!
@screamerbg can you help @adamelhakham with his question?
ARM Internal Ref: MBOTRIAGE-446
There is progress, @screamerbg has made some fixes. We are doing some further testing and will keep you posted
@screamerbg any idea when is this issue will be fixed? Still present in 1.8.0.
@theotherjimmy @ARMmbed/mbed-os-maintainers This is an issue for 5.10 release and setup for client testing.
I was able to reproduce this with parallel-rust. Steps:
run the reproducer in the issue top comment to generate the dir_{1..8} directories.
make a reproducer.bash
with the following contents:
set -e
cd $1
rm -rf mbed-os
if [ ! -f .mbed ]; then
mbed new .
fi
mbed deploy -vvv
cd ..
run
parallel -v -j8 'bash reproducer.bash {}' ::: $(echo dir_*)
This will force ALL 8 mbed
invocations to run at almost exactly the same time. One of the mbed-cli's will fail to cache correctly:
[mbed] WARNING: Unable to cache "/home/jimmy/temp/dir_8/mbed-os" to "/home/jimmy/.mbed/mbed-cache/github.com/ARMmbed/mbed-os"
Running with the changes from #752, I can't get that same line. Amusingly, it's also a bit quicker.
Hi, We have a Jenkins job (on Ubuntu 14.04 machine) that runs
mbed deploy
multiple times in parallel in different directories. In order to speed things up we would like to use the cache feature. However, occasionally, some of the processes run into the following issue:Currently, in order to avoid this issue we turn off the cache feature with
mbed cache off
Is the cache feature supposed to support parallel usage?For convenience, I've added a simple shell script that reproduces the issue (calls
mbed deploy
in a loop from 8 different directories in parallel). It usually takes 10-20 minutes for one of the sub processes to produce the error:Thanks!