Requirements for [array foreach]

dkfellows commented 7 years ago

There are a number of things that are required to satisfy the array foreach bounty. Because there's interest from multiple people over this, I'll lay down what we really need.

[ ] Implementation of a subcommand, foreach, of the array ensemble. (Preferably to be part of Tcl 8.7.)
- Implementation should be in C (as a pure Tcl implementation is unlikely to have the desired memory-use characteristics).
- Implementation may be simple, using Tcl_EvalObjEx, etc.
- Bonus for NRE-aware, double-bonus for bytecode version — only needs to be bytecoded inside procedure-like contexts — but neither of those are required to satisfy the bounty. They can be provided (e.g., by Tcl maintainers) after the bounty is paid out.
[ ] Documentation is required. Can be a plain text paragraph, but should be in the writing style of the subcommands on the array(n) manpage.
[ ] Tests (using tcltest) are required. Should test simple usage, failure modes (such as wrong args, unwriteable iteration variable(s), etc.) and “interesting” edge cases like semantics when the array is written to during the iteration loop.
[ ] TIP required, but trivial. Main things: why is this a good idea (independent of “there's a bounty on it”), and a summary of what are you are doing that will be useful to other Tclers to understand what to expect. (The documentation paragraph is likely to be useful here!)

It's not enough to just have an implementation. It must also be tested (so that we can safely develop improved implementations as required) and it must be documented (so other Tclers can learn about it). The TIP is conceptually part of the documentation, used as a form of milestone in our release notes.

dkfellows commented 7 years ago

Cross reference with these other issues: https://github.com/flightaware/Tcl-bounties/issues/10 https://github.com/flightaware/Tcl-bounties/issues/12 https://github.com/flightaware/Tcl-bounties/issues/15 https://github.com/flightaware/Tcl-bounties/issues/18

dkfellows commented 7 years ago

Cross reference with this existing TIP: TIP #421

bll123 commented 7 years ago

It was the intention of myself and Andy Goth to merge his work and mine to ensure that there would be a single C API. I think this is the best path forward, as his new API will handle the traces and error conditions properly. I would like to benchmark an 'array for' command written using his API and compare it against what I wrote.

The existing TIP has not been updated. Arguments about array foreach vs. array for &etc. have not been started.

For the array for code I wrote:

I attempted to do the bytecode, but was not able to complete it. This effort can be junked.
My implementation is NRE aware.
No documentation has been written.
I implemented a array for {k v} arrayname {}, which is probably too much (and makes the bytecode complicated).
Only array for {k} arrayname {} is needed.
Most of the tests are done. http://core.tcl.tk/tcl/info/bd05353216cc56ff

resuna commented 7 years ago

array for {k v} arrayname { ... } would be an O(N) operation, whereas array for {k} arrayname { set v $arrayname{$k}; ... } would be O(NlogN).

bll123 commented 7 years ago

Yes, it should save a little time, but I don't think the end result is much different from the Tcl code, excepting the overhead of the set call (I could not determine any simpler way to get the value object).

I will try to find some time to reimplement my code using Andy Goth's API and see how that works out.

The C code does:

   if (valueVarObj != NULL) {
        valueObj = Tcl_ObjGetVar2(interp, arrayNameObj, keyObj,
              TCL_LEAVE_ERR_MSG);

       if (Tcl_ObjSetVar2(interp, valueVarObj, NULL, valueObj,
                 TCL_LEAVE_ERR_MSG) == NULL) {
            result = TCL_ERROR;
            goto done;
        }
    }

resuna commented 7 years ago

OK, so your implementation is basically a "C" version of the stock foreach key [array names foo], and is not walking the internal array structures? I don't think that will satisfy the bounty requirements, because all it's saving is a copy of the names of the array elements into a list, which is an O(N) operation. The O(NlogN) term is still in there, so it's not fundamentally more efficient than the naive Tcl implementation.

dkfellows commented 7 years ago

As I understand it, the problem is that traces can do all sorts of tricky things that cause trouble. I guess that a more subtle approach would be to use some sort of modification sequence number in the array (changing the structure of arrays) so that iterations could perform a simple comparison to detect the world changing under their feet. We use similar techniques elsewhere (e.g., with bytecode where renaming a command with a compiler increments the compilation epoch counter).

bll123 commented 7 years ago

No, that is not correct. No, it is not a stock version of foreach key .....

Even after bytecoding, I don't think you are going to get any huge speed gains, but the memory savings are significant.

resuna commented 7 years ago

I didn't mean to imply that it was just a stock version of foreach key [array names foo] { ... }, what I'm getting at is that the time efficiency of the operation is the same as the stock foreach key [array names foo] { ... }. That matches the timing results in the other thread where it's more memory efficient but slower than foreach {key val} [array get foo] { ... }.

Since the loop is running in "C" code it doesn't matter whether the operation itself is bytecoded, the operation lookup is a tiny fraction of the total time.

bll123 commented 7 years ago

Right. I think once bytecoded, there will some time savings, but nothing significant. The fetch of the value object from the hash table (which has all the trace checking, etc.) is never going to be any faster.

resuna commented 7 years ago

OK, I don't understand what you're doing then, if you're walking the hash table to get the keys but you don't have access to the values in the same operation.

What we're looking for is an analog to the Speed Tables "$table search" operation, which is O(N) for a straight linear search because it walks the index and has access to the row in the table without an extra lookup on the key.

resuna commented 7 years ago

Hmm, it's been pointed out that the time for Tcl_GetVar2 shouldn't be O(logN) unless there's a serious problem with Tcl hash tables, but there's some kind of superlinear term because I can't see otherwise how the time for your array for could be higher than the time for foreach {key val} [array get foo] { ... }.

resuna commented 7 years ago

Does [array get foo] bypass the trace checking?

bll123 commented 7 years ago

Well, as DKF hinted, We can:

change arrays to have a global epoch counter each time the array has an insert/delete operation.
not execute any traces on the array.

Then it could be made much faster, as the low-level hash table operations and fetch of value objects could be used.

The problem is, many people will expect an array for to fire the read traces. Others won't care.

resuna commented 7 years ago

The expectation from the bounty request was that array for/foreach would be significantly more efficient than stepping through the array manually in Tcl. It should be significantly faster. Slightly slower than the array get loop (or even the same time, or only slightly faster) does not meet those expectations and I'm really concerned about that.

Does array get fire the read traces?

resuna commented 7 years ago

If there are no read traces set for the array, then it should be able to tell that by checking once at the start of the loop, no? It shouldn't be duplicating that check on every row.

bll123 commented 7 years ago

It appears that array get does not fire any read traces. I will take a look and see what it is doing.

resuna commented 7 years ago

That's a very surprising discovery, and something I will have to keep in mind when debugging Tcl code in the future because I would have assumed array get would act the same as any other read operation on an array.

bll123 commented 7 years ago

No, I was incorrect -- it fires the trace on the individual element. A read trace on the entire array does not fire (don't know if that ever does).

bll123 commented 7 years ago

array get is also using Tcl_GetVar2.

dict for has a speed of 5447.601 microseconds in my same benchmark. Can you simply convert to using dictionaries? You could also keep an array and a dict in sync in order to get the best of both, though that has memory overhead.

resuna commented 7 years ago

I'll get with karl on this, it's a very surprising result. He may still be interested in the code clean-up advantages of being able to step through the array without copying the keys.

resuna commented 7 years ago

Maybe we need a bounty on speeding up array access in general as a follow-up.

resuna commented 7 years ago

Let's revisit this one after Andy gets back to https://github.com/flightaware/Tcl-bounties/issues/18 ?

dkfellows commented 6 years ago

Here is a very simple implementation of array foreach written in pure Tcl:

    proc arrayforeach {keyvarName arrayName body} {
    upvar 1 $keyvarName key $arrayName array
    set s [array startsearch $array]
    try {
        while {[array anymore array $s]} {
        set key [array nextelement array $s]
        uplevel 1 $body
        }
    } finally {
        array donesearch array $s
    }
    }
    # Inject foreach into [array] ensemble
    namespace ensemble configure ::array -map \
    [dict replace [namespace ensemble configure ::array -map] \
         foreach [list [namespace which arrayforeach]]]

I don't think it is quite enough (it only digs out the keys), but it shows that people can test out ideas without having to commit to writing C straight off the bat.

flightaware / Tcl-bounties

Requirements for [array foreach] #27