Open dkfellows opened 7 years ago
It was the intention of myself and Andy Goth to merge his work and mine to ensure that there would be a single C API. I think this is the best path forward, as his new API will handle the traces and error conditions properly. I would like to benchmark an 'array for' command written using his API and compare it against what I wrote.
The existing TIP has not been updated. Arguments about array foreach
vs. array for
&etc. have not been started.
For the array for
code I wrote:
array for {k v} arrayname {}
, which is probably too much (and makes the bytecode complicated).array for {k} arrayname {}
is needed.array for {k v} arrayname { ... }
would be an O(N) operation, whereas array for {k} arrayname { set v $arrayname{$k}; ... }
would be O(NlogN).
Yes, it should save a little time, but I don't think the end result is much different from the Tcl code, excepting the overhead of the set call (I could not determine any simpler way to get the value object).
I will try to find some time to reimplement my code using Andy Goth's API and see how that works out.
The C code does:
if (valueVarObj != NULL) {
valueObj = Tcl_ObjGetVar2(interp, arrayNameObj, keyObj,
TCL_LEAVE_ERR_MSG);
if (Tcl_ObjSetVar2(interp, valueVarObj, NULL, valueObj,
TCL_LEAVE_ERR_MSG) == NULL) {
result = TCL_ERROR;
goto done;
}
}
OK, so your implementation is basically a "C" version of the stock foreach key [array names foo]
, and is not walking the internal array structures? I don't think that will satisfy the bounty requirements, because all it's saving is a copy of the names of the array elements into a list, which is an O(N) operation. The O(NlogN) term is still in there, so it's not fundamentally more efficient than the naive Tcl implementation.
As I understand it, the problem is that traces can do all sorts of tricky things that cause trouble. I guess that a more subtle approach would be to use some sort of modification sequence number in the array (changing the structure of arrays) so that iterations could perform a simple comparison to detect the world changing under their feet. We use similar techniques elsewhere (e.g., with bytecode where renaming a command with a compiler increments the compilation epoch counter).
No, that is not correct. No, it is not a stock version of foreach key ....
.
Even after bytecoding, I don't think you are going to get any huge speed gains, but the memory savings are significant.
I didn't mean to imply that it was just a stock version of foreach key [array names foo] { ... }
, what I'm getting at is that the time efficiency of the operation is the same as the stock foreach key [array names foo] { ... }
. That matches the timing results in the other thread where it's more memory efficient but slower than foreach {key val} [array get foo] { ... }
.
Since the loop is running in "C" code it doesn't matter whether the operation itself is bytecoded, the operation lookup is a tiny fraction of the total time.
Right. I think once bytecoded, there will some time savings, but nothing significant. The fetch of the value object from the hash table (which has all the trace checking, etc.) is never going to be any faster.
OK, I don't understand what you're doing then, if you're walking the hash table to get the keys but you don't have access to the values in the same operation.
What we're looking for is an analog to the Speed Tables "$table search" operation, which is O(N) for a straight linear search because it walks the index and has access to the row in the table without an extra lookup on the key.
Hmm, it's been pointed out that the time for Tcl_GetVar2 shouldn't be O(logN) unless there's a serious problem with Tcl hash tables, but there's some kind of superlinear term because I can't see otherwise how the time for your array for
could be higher than the time for foreach {key val} [array get foo] { ... }
.
Does [array get foo]
bypass the trace checking?
Well, as DKF hinted, We can:
Then it could be made much faster, as the low-level hash table operations and fetch of value objects could be used.
The problem is, many people will expect an array for
to fire the read traces. Others won't care.
The expectation from the bounty request was that array for/foreach
would be significantly more efficient than stepping through the array manually in Tcl. It should be significantly faster. Slightly slower than the array get
loop (or even the same time, or only slightly faster) does not meet those expectations and I'm really concerned about that.
Does array get
fire the read traces?
If there are no read traces set for the array, then it should be able to tell that by checking once at the start of the loop, no? It shouldn't be duplicating that check on every row.
It appears that array get
does not fire any read traces.
I will take a look and see what it is doing.
That's a very surprising discovery, and something I will have to keep in mind when debugging Tcl code in the future because I would have assumed array get
would act the same as any other read operation on an array.
No, I was incorrect -- it fires the trace on the individual element. A read trace on the entire array does not fire (don't know if that ever does).
array get
is also using Tcl_GetVar2
.
dict for
has a speed of 5447.601 microseconds in my same benchmark. Can you simply convert to using dictionaries? You could also keep an array
and a dict
in sync in order to get the best of both, though that has memory overhead.
I'll get with karl on this, it's a very surprising result. He may still be interested in the code clean-up advantages of being able to step through the array without copying the keys.
Maybe we need a bounty on speeding up array access in general as a follow-up.
Let's revisit this one after Andy gets back to https://github.com/flightaware/Tcl-bounties/issues/18 ?
Here is a very simple implementation of array foreach
written in pure Tcl:
proc arrayforeach {keyvarName arrayName body} {
upvar 1 $keyvarName key $arrayName array
set s [array startsearch $array]
try {
while {[array anymore array $s]} {
set key [array nextelement array $s]
uplevel 1 $body
}
} finally {
array donesearch array $s
}
}
# Inject foreach into [array] ensemble
namespace ensemble configure ::array -map \
[dict replace [namespace ensemble configure ::array -map] \
foreach [list [namespace which arrayforeach]]]
I don't think it is quite enough (it only digs out the keys), but it shows that people can test out ideas without having to commit to writing C straight off the bat.
There are a number of things that are required to satisfy the
array foreach
bounty. Because there's interest from multiple people over this, I'll lay down what we really need.foreach
, of thearray
ensemble. (Preferably to be part of Tcl 8.7.)Tcl_EvalObjEx
, etc.tcltest
) are required. Should test simple usage, failure modes (such as wrong args, unwriteable iteration variable(s), etc.) and “interesting” edge cases like semantics when the array is written to during the iteration loop.It's not enough to just have an implementation. It must also be tested (so that we can safely develop improved implementations as required) and it must be documented (so other Tclers can learn about it). The TIP is conceptually part of the documentation, used as a form of milestone in our release notes.