Closed alex-gron closed 11 months ago
Still seeing this issue in this job
When I trigger the reset modal by turning off the stream on the Replication tab:
- I still have raw data in salesforce.airbyte_raw_Opportunity
- This is surprising to me. Is this a bug?
I believe this was a choice to not delete data when disabling a stream. It left the user an option to keep the data while no longer syncing.
Part of the issue that I am running into is that I need to actually reset this Stream and get a full resync and when I click the Reset Stream there are 0 records processed
Still seeing this issue in this job
When I ran the above reset job, I got the following outcome:
Sorry for the slow reply - Move is finally back on Platform work in Q4 so we can take a closer look here.
Reading through the thread,
@davinchia There is an internal thread in #dev-platform-compose where @malikdiarra and @nataliekwong are discussing, and have the latest information on when this does/doesn't work. I'll tag you in it.
Update: this turned out to be a destination issue. See https://airbytehq-team.slack.com/archives/C03C4AVJWG4/p1697823971406669
It looks like the destination bug is now resolved. However, I tested this again with this connection and saw that the state was not cleared and the tables persisted after the reset succeeded. Here's the Loom (no audio) https://www.loom.com/share/08e91d5ac64d497e9b0a83130ab5b5db?sid=e388ce59-fd9b-4ce7-a47e-d03d72fdb5bc
Steps to reproduce:
Observed behavior:
fyi: @jdpgrailsdev
@davinchia @nataliekwong Has the customer given us permission to run the reset? I would like to run one to see exactly what happens in the logs, as the most recent reset on that connection shows that a sync was executed. That would explain why the state appears to not have been reset. I do not see any reference in the logs to the empty source, which we should see if it actually ran the reset. Instead, it looks like we are just running a sync.
@jdpgrailsdev I think if you're referring to this one, free to do anything on this workspace! https://cloud.airbyte.com/workspaces/56812376-bba8-4eb6-a6a1-df64ea889a5d/connections (it's an internal one).
There's an external customer (Preply) who has not given us permission yet, but I can ask if you were referencing that one.
https://cloud.airbyte.com/workspaces/56812376-bba8-4eb6-a6a1-df64ea889a5d/connections
@nataliekwong I was specifically talking about the one in your Loom recording, which I now realize is in our internal space. Thanks!
@davinchia @nataliekwong Something appears to be broken with resets. I did a local test using the same source configuration from @nataliekwong 's internal example above (Stripe source) sending to the /dev/null source. I can see the reset API being called and the code identifying the streams to reset. However, the worker never runs the reset. It runs a check, followed by a real replication, which is why the state never appears to be reset. My suspicion is something got removed/changed in the Temporal workflow code. I don't see how this is isolated to just our internal workspace or these sources/destinations, so I suspect this is a much broader issue. I am going to add some more debugging locally to see if I can determine what is actually happening.
Thanks @jdpgrailsdev!! And just as an FYI: we're working on backfilling streams on the Compose side as a new feature this cycle. The backfill is done by doing a per-stream reset. We were doing testing, and found similarly that Postgres succeeded with the per-stream reset, whereas Hubspot kept failing. We were a bit puzzled as to why, and finally connected this morning that it likely is dependent on the same issue we see here. We're going to wait until we resolve this issue before we pick that project back up.
cc @flutra-osmani
@nataliekwong That would confirm my suspicion that it is related to the backfill logic added to the ReplicationActivityImpl
, which is the bit in the sync workflow that actually kicks off the orchestrator.
Update: after testing both in the internal workspace and locally using the same source connector and configuration, I can confirm that the rest is happening. The reset removes the stream from the state table. However, the UI does not show the updated state unless you refresh the connection page. At that point, the UI displays an error message stating that the state JSON is invalid (because it is null/not present). The UI should probably be modified to handle the "no state" case better and/or note that you may need to refresh the page to see the most recent state after a reset or sync.
Another update: I believe that I have tracked this down. There is a feature flag in the ReplicationActivityImpl
chooses between calling the replicate
and replicateV2
method. Only the replicateV2
method updates the configured catalog to have the correct destinationSyncMode
in the configured catalog used by the destination as part of the reset. For whatever reason, a full reset appears to have the correct destinationSyncMode
despite not being modified by the ReplicationActivityImpl
. The "Reset this stream" option does not, which is why the staging table in the BigQuery destination does not get cleared out. I cannot tell if this has always existed or whatever introduced the replicateV2
caused this (or something else). In any event, there appears to be a lot of variance between where and how the configured catalog is modified (or not modified) as part of a reset. The feature flag in question is platform.remove-large-sync-inputs
and checks on the workspace ID for inclusion (it's actual an inverted check, so if off it uses replicate, if on, it uses replicateV2). Git blame seems to indicate a change made by
@Michael Siega
back in September, but again, I am unable to determine if this is when this started or if it pre-existed. The fact that the full reset works even though it also goes through the replicate
method that does not modified the configured catalog seems to indicate a bigger, pre-existing issue.
@jdpgrailsdev I created this original issue in June, so I agree that it doesn't seem like this is new behavior.
@jdpgrailsdev "replicatev2" path is not yet enabled. I just enabled it for 1% now. It was also enabled for 2 workspaces Natalie was testing on the last couple of days. This new path is tied to the "large sync input" flag. What it does is fetch the catalog and state when needed, instead of relying on the sync input from the upstream workflow.
After a lot of additional digging, I have found the bug. The UI sets the namespace
on the individual stream to reset to a blank string when the namespace of the stream is null
:
"resetConfig": {
"streamsToReset": [
{
"name": "checkout_sessions",
"namespace": ""
}
]
}
When the configured catalog is loaded by the backend, missing namespaces
appear as Java null
objects, not empty strings. The code that updates the configured catalog to prepare it for reset uses Java equality on the StreamDescriptor
representation of the stream:
public boolean equals(Object other) {
if (other == this) {
return true;
} else if (!(other instanceof StreamDescriptor)) {
return false;
} else {
StreamDescriptor rhs = (StreamDescriptor)other;
return (this.name == rhs.name || this.name != null && this.name.equals(rhs.name)) && (this.namespace == rhs.namespace || this.namespace != null && this.namespace.equals(rhs.namespace)) && (this.additionalProperties == rhs.additionalProperties || this.additionalProperties != null && this.additionalProperties.equals(rhs.additionalProperties));
}
}
Because the UI passes in an empty string, the equality check fails (null
!= ""), and therefore the catalog is not updated to have the correct destinationSyncMode
. Streams that have a non-null/non-blank namespace would not have this problem. Full resets do not have this problem because all streams are fetched on the backend when the reset request is made, therefore having the correct namespace for the Java equality check.
After discussing this with @chandlerprall, the approach will be to mark the namespace as optional in the ConnectionStream
domain object in the API spec and stop passing the namespace
parameter from the UI if it is blank.
PR resolving the stream namespace issue has been merged. I'm leaving this issue open until it can be confirmed as fixed.
I've tested and verified with two API connections, and this WORKS! 🙌 🙌
What I saw:
@nataliekwong May I close this ticket now that we have confirmed the fix?
Yes! Please go ahead.
Hi @jdpgrailsdev @chandlerprall i am facing this issue in my oss version 0.50.44
. Is this suppose to be fixed ? Can you please let me know if i should be upgrading the version?
I am using a Salesforce connection to BigQuery on Airbyte Cloud
When I manually trigger the reset from the Status page
When I trigger the reset modal by turning off the stream on the Replication tab: