Open michelgalle opened 1 year ago
I confirm that by using row_number as @michelgalle stated, the problem goes away at least from a UI standpoint.
Thanks for the report, and the investigation!
It seems like the 2 rows with the same timestamp are happening because the frontend is actually requesting schema discovery twice. This isn't intended, but it shouldn't case the backend to fail like this. Indeed as you note, the combination of actor_catalog_id, actor_id, created_at
isn't guaranteed to be unique.
Using row_number
seems like a good option. We can also include the id
column which is intended to be unique. I'll send a PR with those changes.
@mfsiega-airbyte I found that both getMostRecentActorCatalogFetchEventForSource
and getMostRecentActorCatalogFetchEventForSources
need to be updated as they both fail when there are duplicate records.
For getMostRecentActorCatalogFetchEventForSource
I had to include the ID as part of the groupBy (as stated in the error I got).
After both methods were patched I could setup a connection succesfully.
@hugozap thanks for the pointer! I put up a PR for getMostRecentActorCatalogFetchEventForSources
since that one was a bit clearer.
For getMostRecentActorCatalogFetchEventForSource
I'm not totally following - I don't see in the code that it's doing a groupBy
. Am I misunderstanding? (If you have your patch in a fork/branch somewhere, feel free to point me there instead?)
@mfsiega-airbyte Sorry I was not clear.
With getMostRecentActorCatalogFetchEventForSource
the problem is that it will also fail if duplicate records were written. I know this because I first patched locally getMostRecentActorCatalogFetchEventForSources
and then the UI worked but the moment I tried to setup a new connection I got an error again. The cause of the error is that the limit(1) will fail because it won't know which record to choose.
I fixed it locally by adding ACTOR_CATALOG_FETCH_EVENT.id.desc() to the ORDER BY to break the tie and limit(1) will work again.
To test that it fails, you would have to simulate the current write bug by inserting a duplicate record (same actor id and create date).
Environment
Current Behavior
After setting up a connector (unfortunately I don't have the logs nor remember which connector it was), I started getting the following error in the ui:
Expected Behavior
The ui behave as expected.
Logs
The only logs I can get are the ones posted on a slack thread.
Steps to Reproduce
actor catalog fetch event
with the same timestamp. Not sure why it happened.I already did some troubleshooting here and know the reason for the error. Since there are 2 rows with same timestamp, this should be using
row_number()
instead ofrank()
. Since I am new to airbyte, I am not able to inform whether the code that inserts into the table should be fixed so we do not have 2 rows with same timestamp for sameactor
or simply fixing the query I mentioned above is enough.Are you willing to submit a PR?
No