Closed zubairov closed 4 years ago
I believe the best way to handle this is to:
There is a similar problem addressed in the CSV write action of the CSV component and I believe that this is a generic problem which should be solved at the platform level as opposed to the component level.
@jhorbulyk good idea with batching component, however the goal of Salesforce Batch API is to work with GBs of data, so batch of such suze is highly unpractical (and think about the passthough content too) and will explode the all components with standard memory allocation.
I believe we should use the same tacitcs like in CSV component - accumulate the data elswhere via streaming (e.g. steward) and then based on the time or "number-of-records" trigger de-stream the data from the location into the Salesforce Batch API.
I also believe that putting an entire batch in an AMQP message passed through RabbitMQ would be infeasible and that the engineering solution would be different. However, I am suggesting something like the following:
An action can be configured (in component.json
) to either receive a single message or a batch of messages. If a batch of messages is selected then the following will happen:
Consider the following flow:
Component A
-> Component B
where Component B
wants to write a batch.
A developer and/or integrator would be able to configure (in either the component.json or in the UI) for Component B
either/both the max batch size and the max time between batches.
I suggest that the above flow would implicitly be constructed:
Component A
-> Mapper A->B
-> Batcher
-> Component B Batch Action
A message in the above flow would have the following lifecycle:
Component A
I suppose it would also make sense if there was some background process that monitored open batches and closed them when the time since the last batch published > max time between batches.
Interesting idea @jhorbulyk, especially the way how you suggest to handle batches in sailor
however there are following drawbacks:
As you have noted in your sample Batcher
has no metadata by itself but should passthough metadata from the next component to the mapper, we don't have this concept rigth now and can't easily do it (in user-space).
Idea of creating reusable pieces / components is in the very core of e.io value however when creating a batcher
component as suggested above actual batch semantic will be lost. For example if underlying component will not know that it works on the batch then it may not benifit from the 3rd party batch API (e.g. Salesforce Batch API), think about retries in case of failures or ack/nok semantic consuming incoming messages.
Based on the discussion above I don't believe we could create encapsulate batching functionality as part of dedicated component (at least at the momen) therefore we should build a reusable batching functionality on a different level (e.g. library level) and reuse it in the batch-oriented actions accordingly.
I don't believe we could create encapsulate batching functionality as part of dedicated component (at least at the momen) therefore we should build a reusable batching functionality on a different level (e.g. library level) and reuse it in the batch-oriented actions accordingly.
@zubairov The more I think about it, the more I agree.
Is this blocked by https://github.com/elasticio/projects/issues/140 ?
As the task is of a high complex, it should be investigated
The feature has been implemented in the PR https://github.com/elasticio/salesforce-component/pull/85
Saleforce has a Bulk API that we could potentially use in the connector. Here is the enhancement request to discuss the advantages, drawbacks and implementation strategies of it's support in elastic.io