Open Ri0n opened 3 years ago
Your approach is fine and should work to avoid, but the problem is a little surprising: the code in question was evolved from Python's copy
module (long ago, it has evolved in other ways since) and it has the same basic construct:
for key, value in x.items():
y[deepcopy(key, memo)] = deepcopy(value, memo)
return y
and this doesn't seem to be causing people to hit this. In the scons case the surprise is that cloning an environment shouldn't be happening in a threaded context.
You'll need to provide a reproducer for us to be able to help/fix this. Please come back to discord or the users mailing list.
Also we'd need more of the stack trace to help.
As I'm sure you read when you opened this issue we prefer to discuss issues in users mailing list, IRC, or discord before anyone files an issue.
Hi guys. Nobody answers on Discord and honestly I hate using mailing lists. So maybe someone else with the same problem will discuss this with you. With regards to stack, there are just entries from our product, nothing interesting. But one of the entries mentions starting a thread. I don't want to expose any stuff which is internal for the company, sorry. Regarding a reproducer, since making a reproducer will take way more time than the fix, it doesn't worth it.
We plan to have a private Scons fork for now, since we ca't wait till it's resolved. Sorry for bothering you with such non-standard issues.
One more thing to mention. The issue was noticed while porting our build system from Python2+some old Scons to Python3+Scons4. So apparently the backward compatibility was broken. Basically it's fine to break it with major releases. But unfortunately for us it may require some significant redesign. Even though I haven't reviewed the problem deeply from our side yet.
That this exposes with Py3 is no surprise: that's when the keys
, values
, items
dictionary methods started returning views by default, as opposed to needing ask for one (e.g. Py2's iteritems
). It's not my call but I don't see any problem with your change.
@Ri0n - you're clearly many timezones out of sync with many of our community. Expecting instant responses on discord is unrealistic.
Also we need a reproducer so that we can add a test to cover this issue. So it IS worth the time.
sure. I'll make a reproducer. Unfortunately it won't be soon.
@Ri0n We've had other reports of this now. It seems in the specific known case, a modification to an env (well, in this case a Clone
of one) was being initiated in an Action function. We don't think that's a great idea - SCons pretty much expects the various created environments to be stable once the build phase kicks off - and at the very least, at that point, you're in multi-threaded mode and thread safety becomes an issue. Is there any chance you have enough information left over to know if that applies to your case as well? And if so, a sketch of the use case, so it's more clear if this kind of late modification really ought to be supported.
In my case we have a huge project and there is a lot of code around Scons splitted up into modules responsible for various project parts or just some common helpers.
On start we load all the modules. Some of them are heavy on initialization, so we load them asynchronously in threads. On initialization the modules add various stuff to the env. Some of them clone the env first and then add its stuff to the clone, others just add to the existing env. Eventually we join all the threads. And after that start building the project.
Honestly SCons was never designed or coded to support initializing the build via multithreaded SConscript logic. And there never has been (thus far) the intent to support such usage.
If you're seeing the dictionary in question being changed while you're cloning it, that should tell you you have unpredictable unguarded writes happening, and could yield very unpredictable repeatability.
There are many very large builds which aren't using multithreading to initialize the logic. Generally SConscript/SConstruct processing time is a small fraction of any build..
How many source files are you dealing with? Are you running non-trivial python logic in your SConscript logic?
Hard to say how many files are compiled with SCons. But there are 16k just c/c++ files in the project. Regarding the Clone() on the initialization it seems theres is just one place in our code and it does't care much about consistency of the cloned env. It roughly does the following (kind of shallow clone)
android = env.Clone()
dict = android.Dictionary()
keys = dict.keys()
for key in keys:
del android[key]
env['android'] = android
It's where it often fails on the first line. Regarding the non-trivial logic it's everywhere there. The framework is designed to be totally crossplatform with regards to boh host and target.
That's kind of an interesting usage, what's the idea? You get a duplicate environment with all the construction variables removed but the methods intact? Or is the pseudo-code description leaving something out?
Nothing left. the code is almost as is. Then the cloned env is populated with some variables afterwards. I'm not really sure why it's needed. The guys who implemented this left the company long ago. From the code the strong necessity is not obvious.
The only related comment in the code I see
# Clone the environment and clear it out. I want the class not the data
Well I start thinking I just have to try to refactor this without any cloning and check if it works :-)
Might be worth a look, yes. If you find this is really what you need to do there's a cheap shortcut (this isn't considered an "official" way, so you didn't hear it from me :-) ):
android = env.Clone()
android._dict = {}
But you're actually paying quite a cost - the Clone
does quite a lot of fiddling, including rerunning tool initialization, to make sure the vars get set up right and then those are just tossed... there may be a better way to handle.
Can you disabled the threaded initialization and see how long it takes? 16k files is not that many... There are much larger builds around.
Also an OverrideEnvironment() may do what you want, but please try disabling the threaded sconscript processing first?
Scons 4.0.1 on Gentoo Linux. I didn't get and answer on Discord, so posting here
My company uses Scons for a huge project, so it's sometimes hard to say what's going on or how Scons was invoked. But quite likely the code above was executed in a thread and I don't know if it's eligible.
I believe the easiest fix is to get a list of keys and not items iterator.
something like this probably