airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
15.47k stars 3.99k forks source link

Source Braintree: incremental mode only replicates 50 records per day #32197

Open keurcien opened 10 months ago

keurcien commented 10 months ago

Connector Name

source-braintree

Connector Version

0.2.0

What step the error happened?

During the sync

Relevant information

Since the update to Braintree connector 0.2.0, we have noticed that only 50 records were replicated every day on the Transaction stream (in incremental mode). In our case, the Braintree connection is triggered only once every day. By rolling back to 0.1.5, we were able to restore the previous behaviour and replicate all Braintree transactions.

Capture d’écran 2023-11-06 à 11 37 45

Relevant log output

No response

Contribute

natikgadzhi commented 2 months ago

@keurcien, welp, sorry about the half a year delay ;-(

Could you try again and tell me if this is still an issue? If so, @ChristoGrab and I would work with @AGPapa and our community devs to see if we can get this fixed soon.

AGPapa commented 1 month ago

I'm experiencing this issue too.

It looks like it was introduced in this PR that re-implemented the Braintree connector using "low-code". https://github.com/airbytehq/airbyte/pull/29200

This is how the old code worked:

def get_items(self, start_date: datetime):
    return self._gateway.customer.search(braintree.CustomerSearch.created_at >= start_date)

items = self.get_items(start_date)
result = []
for item in items:
    item = self.get_json_from_resource(item)
    item = self.model(**item)
    result.append(item.dict(exclude_unset=True))
return result

That _gateway.customer.search function returns a ResourceCollection object that can be iterated over to get all results.

However the new code works differently. It makes one HTTP call and then iterates over the results

data = XmlUtil.dict_from_xml(response.text)["customers"]
customers = self._extract_as_array(data, "customer")
return [Customer(**self._get_json_from_resource(BCustomer(None, customer))).dict(exclude_unset=True) for customer in customers]

(it uses the HttpRequester with NoPagination)

It seems like Braintree limits the results for one HTTP call to 50 records, so each sync only gets 50 rows.

I'm trying to figure out how to fix this without completely reverting to the old code.

keurcien commented 1 month ago

@keurcien, welp, sorry about the half a year delay ;-(

Could you try again and tell me if this is still an issue? If so, @ChristoGrab and I would work with @AGPapa and our community devs to see if we can get this fixed soon.

Hi, I haven't reset it for a long time, it's been running incrementally since the opening of this issue. And we backfilled the older transactions with a separate dump. Would happily test changes made to the connector if needed.