flightaware / speedtables

Speed tables is a high-performance memory-resident database. The speed table compiler reads a table definition and generates a set of C access routines to create, manipulate and search tables containing millions of rows. Currently oriented towards Tcl.
https://flightaware.github.io/speedtables/
BSD 3-Clause "New" or "Revised" License
66 stars 16 forks source link

search fails to fully unset the data in the data array for each loop #56

Closed nugget closed 8 years ago

nugget commented 8 years ago

search fails to cleanly unset the data array on each iteration of the code block. It only reliably clears out array elements that exist in the underlying speed table, but neglects to clear any array elements that were added during the execution of the code block.

Assuming a table containing a single field "id" as a numeric index, this can be reproduced like this:

$speedtable search -array buf -code {
    if {$buf(id) == 5} { 
        set buf(interloper) "this field should only be in record number 5"
    }
    puts [array get buf]
}

Which runs with this output: id 1 id 2 id 3 id 4 id 5 interloper {this field should only be in record number 5} id 6 interloper {this field should only be in record number 5} id 7 interloper {this field should only be in record number 5} id 8 interloper {this field should only be in record number 5} etc . . .

I expected the buf array to be reset completely each time the code block is evaluated.

resuna commented 8 years ago

My first thought was "yes, this should probably be fixed, but it's not critical since the documentation doesn't make any promises either way". But there might be an issue with nulls not getting cleared ( -array vs -array_with_nulls ) which would definitely be a bug. The related pgtcl bug may have the same problem as well.

resuna commented 8 years ago

No, in speedtables at least it does explicitly clear the array element on array_set.

nugget commented 8 years ago

I've never used -array_set, only -array.

resuna commented 8 years ago

You mean array_with_nulls? array_set is an internal routine ($table_array_set) that implements array. It takes care of clearing nulls.

There doesn't seem to be a straight Tcl C API call to clear an array, so it's not going to be a one-line fix.

This is unrelated to stapi.

nugget commented 8 years ago

Oh, nevermind, I see what you're saying now. Thanks.

resuna commented 8 years ago

Looking at the code in ctable_search.c I'm not seeing anything changed that would fix this. case CTABLE_SEARCH_ACTION_ARRAY is not clearing the array. You sure you can't reproduce?

nugget commented 8 years ago

I can still reproduce. My earlier statement to the contrary was in error.

resuna commented 8 years ago

Tcl_UnsetVar2 is the public API. We can call that before setting it, but should it check if there is already a non-array var with that name and error out? For consistency, using a non-array variable name here should be an error.

Or am I overthinking it? The semantics of unset and array unset in Tcl are a little mixed up. Should ctables even care?

unset a - unsets a whether it's a variable or an array, errors out if a doesn't exist array unset a - only unsets a if it's an array, errors out if a is a var, no result if a doesn't exist

bovine commented 8 years ago

"search" already errors if you supply a name that is already used by a non-array variable, but not until it has found a row and actually tries to set the first element of the array, so there is already an undocumented contract that the name must not already exist, or it must be an array.

So doing the "array unset a" proposal would be the least impact on breaking existing code.

resuna commented 8 years ago

Yes, that's the behavior I would expect, and that's the behavior that would be correct. It's implicit in the semantics of Tcl arrays.

There's no published C API for that, though, and it looks like the code in generic/tclVar.c:4191 (Tcl 8.6 source) uses internal APIs. So we'd have to clear the array at the end of the loop (see attached diff below).

diff.txt

The current semantics leave the last array value intact after the table-search is complete, and there may be code depending on that. This would break that, and surprising as it may seem I wouldn't rule out someone having written code that uses it.

bovine commented 8 years ago

I might propose unsetting the array before each row, and then finally prior to returning after the last row. That would ensure no interlopers present initially in the array.

resuna commented 8 years ago

I'm actually concerned about unsetting it at the end of the row.

I would rather do effectively an "array unset" at the beginning, but I don't see how to do that with just the published Tcl APIs, unless I just stomp on any existing variable with the name. That would be easier: just put

Tcl_UnsetVar2(interp, Tcl_GetString(search->rowVarNameObj), NULL, 0);

at the beginning of the case CTABLE_SEARCH_ACTION_ARRAY_WITH_NULLS/CTABLE_SEARCH_ACTION_ARRAY.

resuna commented 8 years ago

Something like this.

bovine commented 8 years ago

Sure, leaving the last row set might be safer in case anyone was using "break" and expecting it to be left in existence.

Your last example seems reasonable.

lehenbauer commented 8 years ago

Tcl_UnsetVar2 is the way to go. If you want to check the first time to see if there is a variable there, which seems to me to be fairly good, then do a Tcl_GetVar2Ex(NULL, arrayName, NULL, 0) and if it returns non-null then return an error.

Send NULL for interp instead of interp else Tcl will put an error message in the result object and set the errorCode for the normal case when the var doesn't exist.

Hmm it may be harder than this if there is already an array present instead of a var.

If there is an array present I agree with @bovine 's point of ensuring no interlopers present initially in the array.

resuna commented 8 years ago

That's clever, Karl. Roger wilco. On Apr 27, 2016 07:37, "Karl Lehenbauer" notifications@github.com wrote:

Tcl_UnsetVar2 is the way to go. If you really want to check the first time to see if there is a variable there then do a Tcl_GetVar2Ex(interp, arrayName, NULL, 0) and if it returns non-null return TCL_ERROR and since you passed interp instead of null the result object will already have "variable isn't an array" and errorCode will be set to TCL LOOKUP VARNAME x.

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/flightaware/speedtables/issues/56#issuecomment-215068735

resuna commented 8 years ago

[previous comment based on email from Karl's comment, not edited version]

If you don't pass interp, then how does it know the context to look up the variable name?

The snippet I provided should work without clobbering anything in interp, at the cost of calling Tcl internal routines. Or it can clear the error state of the interp.

Definitely agree this should only be done first time through. Actually, it could do it while parsing the arguments.

lehenbauer commented 8 years ago

Busted. You're right. I was thinking about some of the other object-manipulating functions. As long as you don't "or" TCL_LEAVE_ERR_MSG with the flags then you'll be all right on the error message.

resuna commented 8 years ago

Doesn't work, Tcl_GetVar* on an array returns the array... which stringifies to a key-value list:

Tcl_ObjGetVar2 has value 'id brock name {Brock Sampson} home {Venture Compound} show {Venture Bros} alive 1 gender male age 35 coolness 100 _key brock' as string foo is not an array

(I used Tcl_ObjGetVar2 to avoid an unnecessary Tcl_GetString() but that shouldn't make a difference)

resuna commented 8 years ago

I can try setting a dummy element in the array with Tcl_SetVar2Ex.

resuna commented 8 years ago

That seems to work. It's a bit of a kludge, though.

patch2.txt

lehenbauer commented 8 years ago

Yeah that's gross. How about if we just document it as unsetting anything if it's already there and move on with our lives.

resuna commented 8 years ago

The original behavior is also not technically wrong, it's just surprising. We could simply document that.

Pgtcl has the same behavior ( see https://github.com/flightaware/Pgtcl/issues/4 ).

bovine commented 8 years ago

The original behavior was found to have caused an obscure bug that was affecting multicom greatly. We want to prevent similar problems from unintentionally occurring in the future.

resuna commented 8 years ago

OK, I'll do it.

I take it that you're nixing using tcl internals @lehenbauer ?

lehenbauer commented 8 years ago

Yeah just nuke the array/var at the start of the first pass and the end of each pass, I think.

bovine commented 8 years ago

We discussed that it would be better to only nuke it at the start of each pass, and not at the end. That would minimize impact on code that assumed the last row would be retained if you break out.

resuna commented 8 years ago

Committed to branch "issue56" with tests updated. Merge when satisfied.

resuna commented 8 years ago

Merged.

resuna commented 8 years ago

Branch deleted.