Closed theandrew168 closed 4 years ago
Hi Andrew,
Thanks for submitting this PR. Given the way we handle passing state currently (as you mention above), I think this is probably a reasonable way to convey the row count from the Cmd
to the Cursor
.
A couple of quick notes regarding what could be done before merging, in my opinion:
rowcount
is unavailable or can't be determined. This sounds like a good idea to me - it looks like conn->rowcount
should probably be set to -1 on creation and whenever teradata_execute
is called, does this logic sound right (to everyone)?ActivityCount
attribute from the success parcel? I haven't checked on that part of the docs recently, but I will now. Even if it is overkill it might be worth checking the length of the data anyway before decoding.ActivityCount
and hence rowcount
is a uint32_t
, this should be used as the field type and for the temporary variable in teradata_execute
. Of course this prohibits us from using -1 directly in that field as specified above: I would maybe suggest keeping some separate state (on the conn
or otherwise) which indicated to Cmd_rowcount
whether we are in a valid state for a row count or not, and checking that to determine if -1 should be returned. Definitely open to suggestion here, there might be something more straightforward to do.rowcount
will be populated warrants an example as well. If you are interested in adding this please feel free, otherwise I would be happy to update the docs.Curious to hear thoughts from you and others, as there is a bit of nuance here. Thank you again!
I updated the PR with a couple of changes. The first is that I removed the unnecessary argument checking from the Cmd_rowcount function since it doesn't accept any. Second, I updated the type of rowcount
in the TeradataConnection struct and added some thoughts.
I could see reasons for using either long, ssize_t, or int64_t to hold the value since it needs to support up to UINT32_MAX. It would also be nice to allow for -1 as an indicator of an error or that the field hasn't been initialized yet. Please let me know your thoughts on this.
Additionally, I did some testing on when the PclSUCCESS arrives in the scope of a typical execute
query. The pseudo-code I used for this is shown below:
import giraffez
with giraffez.Cmd(host='hostname', username='username', password='password') as cmd:
# Assume that schema.table1 holds M records
results = cmd.execute("select * from scheme.table1")
print(results.rowcount) # Prints M before any records have been fetched
for _ in range(10):
row = next(results)
print(row) # Prints the next row of info as expected
print(results.rowcount) # Continues to print M
# Assume that schema.table2 holds N records
results = cmd.execute("select * from scheme.table2")
print(results.rowcount) # Prints N before any records have been fetched
for _ in range(10):
row = next(results)
print(row) # Prints the next row of info as expected
print(results.rowcount) # Continues to print N
Given these results, it definitely appears that PclSUCCESS arrives (with the correct value) before any records have been fetched and continues to remain valid until a new query gets executed. This leads me to believe that any given call to Cmd.execute()
returns only one PclSUCCESS. If this is the case, then any of the subsequent data fetching operations won't have an effect on the rowcount
that was already set for a given query. There might still be some logic or situations that I'm missing but it seems to be working as expected.
I was working on an experimental branch to also try and implement a TeradataCursor
type (for lack of a better description) that is passed around to keep information such as the command string and the rowcount, and I will try and commit it as is for a reference in a different branch (it is working I believe but can't make a ton of guarantees on it). I made a new type instead of attaching to the TeradataConnection
because while the TeradataConnection
type corresponds to a Teradata session, it is possible in the future to track parcel requests for [limited] asynchronous execution within a single session, and having that already split out would help. I was also trying to improve the messy interface a bit by removing the multiple execute functions. I have experienced the same observations you have made re: PclSUCCESS so far, so I think its safe to say the behavior is as expected.
But yeah, I will try and get that committed to a branch for review, comments, thoughts, etc since it might be helpful to implementing this to have multiple eyes and opinions on it.
I hope things are well! I'm curious if there is anything I can do to help this PR along? Are there any design decisions or other problems yet to be solved? If so, I'd be glad to lend a hand where possible. Thanks!
Did you have a chance to look at the branch ext-cleanup
? It ends up changing a lot of the teradata C extension code and adds a kind of "cursor" object to return things like rowcount, and I wanted to see if you had a chance to check it out and see if the rowcount feature works as expected. If it does then I will just go ahead and accept this PR, and rebase that one, but wanted to get your input first before proceeding.
I went ahead and spent some time looking at the ext-cleanup
branch this morning. I definitely like having an explicit concept of cursors. I did notice one small bug, however. The struct CliSuccessType
defines ActivityCount
as an array of Pclchar
s, which are signed. This is messing up the shift math and resulting in invalid counts. An explicit cast of each element to unsigned char
fixed this issue for me, however.
diff --git a/giraffez/src/teradata.c b/giraffez/src/teradata.c
index 49a6215..87f47dc 100644
--- a/giraffez/src/teradata.c
+++ b/giraffez/src/teradata.c
@@ -329,10 +329,10 @@ PyObject* teradata_handle_parcel_status(TeradataCursor *cursor, const uint32_t p
success = (struct CliSuccessType*)*data;
*data += sizeof(success);
uint32_t count = 0;
- count = success->ActivityCount[0];
- count |= success->ActivityCount[1] << 8;
- count |= success->ActivityCount[2] << 16;
- count |= success->ActivityCount[3] << 24;
+ count = (unsigned char)success->ActivityCount[0];
+ count |= (unsigned char)success->ActivityCount[1] << 8;
+ count |= (unsigned char)success->ActivityCount[2] << 16;
+ count |= (unsigned char)success->ActivityCount[3] << 24;
cursor->rowcount = (int64_t)count;
}
break;
Other than that, I think the branch looks solid. I hope that the addition of the cursor concept doesn't regress any other features of the code but, then again, it isn't that big of a change. This branch also supplants my original PR so merging it seems unnecessary.
Let me know what you think!
Very good catch, I had thought I casted those to unsigned char but clearly did not (also the few tests I had run were with low result numbers that were unaffected by the incorrect shift math). I'll try to merge this soon after some more tests, but if any regressions are introduced that you find just let me know in the issues and we can get them addressed quickly. Through a comedy of errors I lost access to a Teradata server to test with and my access to this repo, but will hopefully be getting access to another Teradata server to continue support of giraffez (and I finally got my repo permissions fixed).
This is my first proof of concept to resolve issue #54. Pulling the value out of the PclSUCCESS parcel was easy enough, but I saw a few different options for propagating the
rowcount
back up to a Cursor object. I initially looked at using the return value from teradata_execute, but that would have required a few more changes to the codebase (as well as changing the existing usage of that function).The changes in this PR involve adding a
rowcount
field to the TeradataConnection struct. Then, if a PclSUCCESS is found while reading parcels, pull out its ActivityCount and throw it into that field. Lastly, a property was added to the Cursor object for reading this value.My primary concern with this implementation is that
rowcount
doesn't seem to really belong in theTeradataConnection
struct. Had there existed some sort ofTeradataCursor
struct, I would have leaned toward putting the value in there instead. However, from what I was able to discern, the passing of database state between Cmd.execute and the Cursor it returns is contained within the conn parameter. Additionally (and I could be wrong), it looks like thisconn
parameter is actually an instance of theCmd
object defined here. Another downside of having the field located inTeradataConnection
is that it'd get overwritten by every subsequent database action.Do you suppose it'd be more ideal to add the
rowcount
field to something like TeradataEncoder since its "context" is more closely related to each individual SQL command? Do you suppose that some completely different scheme that uses the return value from teradata_execute would better fit this problem?Please let me know what you think!