BuzzCutNorman / tap-mssql

Singer Tap for MS SQL built with Meltano Singer SDK.
MIT License
2 stars 9 forks source link

bug: incremental sync does not work #24

Closed davert0 closed 1 year ago

davert0 commented 1 year ago

Hello, I got an error "SystemError: <class 'pyodbc.Error'> returned a result with a set of errors". After a bit investigation, I found out that the problem is the wrong query. I continued the debugging and found that the cause is in singer sdk

image

When I replaced line 192 with just sqlalchemy.text("ID >= 3540505") , everything worked. There is my config:

image
BuzzCutNorman commented 1 year ago

@davert0 thank you for using this tap and raising this issue. This will definitely be useful once I start working on the Incremental loads. Currently, I am working on getting Full loads and typing where I want them to be before I start working on the incremental loads. I try and give that warning up front.

tap-mssql is a Singer tap for mssql. !!! Warning !!! work in progress. It works ok 😐 for full loads.

Please let me know if I need to make changes to the wording, highlight it differently, or add it to the Meltano Hub documentation.

I will update this case as soon as that works starts. My hopeful estimate is in two weeks but don't hold me to that. Again, thank you. 🙏

BuzzCutNorman commented 1 year ago

I have a workload setup that gives me the exact same error. I also found that I get a similar error when running the workload using the pymssql driver. The error can be reproduced when invoking tap-mssql via the following command meltano invoke tap-mssql.

Off we go on a coding adventure. 😃🎉

BuzzCutNorman commented 1 year ago

First thing I learned in my adventure is that the replication_key_col is type sqlalchemy.Column so if you use it in a text bind it fails. Everything works when repliation_key_col.name which is a string of the column name is used in bindparams.

I have a clearer understanding of how the tap configuration parameter start_date works. If your replication key column is of data type date or datetime and there are no previous bookmarks, then that is the value returned by self.get_starting_replication_key_value(context). If there is no value or default value for start_date then you get None.

Now I guess this leads me the daring adventurer a couple of things to ponder. 🤔

  1. Should I continue to use the sqlalchemy.text method when I have a valid Column and a variable to match it with.
  2. When there is no bookmark or start_date should I continue to with None or Null as a value in my where clause or should I get the minimum value of replication key column and uses that as my starting point.
BuzzCutNorman commented 1 year ago

I think I am going to head in the following direction:

1.Should I continue to use the sqlalchemy.text method when I have a valid Column and a variable to match it with.

I am going to bring complete contents of get_records into the mssqlStream.get_records and remove the use of sqlalchemy.text. Recently I also added the code from RESTStream.get_records() that calls the post_process() function so I could encode binary data types and resolve date data from triggering JSON Schema checker errors on the target side. This code will also need to be incorporated as well. I think both changes could be submitted as SDK additions.

  1. When there is no bookmark or start_date should I continue to with None or Null as a value in my where clause or should I get the minimum value of replication key column and uses that as my starting point.

Looking at get_records more this is moot point. If start_val is None then a where clause is not added.