Closed iiLaurens closed 1 year ago
Hello,
Thanks for reporting this. I agree that the time is ignored. I'm, however, not sure how to fix it in a general way. If we just use a Timestamp instead of a Date in line 1157, I think users would have a similar problem if they use a date type in their database.
This happens when we use string comparisons in the "fallback comparison" starting in l. 1151. So a possible workaround might be that you make your own srcdateparser and make it return timestamps of the same type as your PEP249 module (which one do you use BTW?).
Best regards, Christian Thomsen
You are right this only occurs in the fallback comparison. After further inspection I noticed there is a deeper issue that puts me in this 'fallback comparison' section of the codebase. The deeper issue is that return types from the other
object representing the previous line can return inconsistent types. This happened for me when when using a sqlite3 database with a datetime column for the toatt
and fromatt
columns. Instead or return datetime types it would return string objects (or vice versa, I don't remember exactly anymore). This caused the toatt
values for the row
and other
objects to be of different types, pushing me to the fallback comparison you talked about.
Trying to get the types for toatt
and fromatt
consistent between the source input and the existing database wasn't actually easy for sqlite3. One issue I ran into is that sometimes a datetime typed object is passed to the default srcdateparser, which expects a string. I solved this by creating a custom srcdateparser like this:
def robust_datetime_parser(ymdhmsstr):
if isinstance(ymdhmsstr, datetime.datetime):
return ymdhmsstr # Already a datetime object
return pygrametl.ymdhmsparser(ymdhmsstr)
Ideally you would do some strict typechecking (and perhaps type hinting so that users of this library know what kind of types to provide) to prevent type mismatches going into the library and coming out from the variety of database engines. But including the above little patch to the default srcdateparser as seen above helps solve some of the annoying symptoms when we are already dealing with datetime objects and not with strings.
I submitted a merge request to allow users to provide native date(time) objects as inputs. This helped me get around the issue that I was experiencing.
Handled by PR #54
For type 2 SCD tables, when the toatt and fromatt columns are datetime objects (so YYYY-mm-dd HH:MM:ss), the time ignored when checking for equality of the srcdateatt and fromatt of the previous row. This could lead to different versions being created even if the srcdateatt and fromatt were the same.
Specifically, I am referring to line 1157 to 1160.
Shouldn't the logic include for the possibility of having datetime columns?