BookingSync / synced

Keep your BookingSync Application synced with BookingSync
MIT License
3 stars 2 forks source link

Fetch and save pages separately to avoid keeping all data in memory #52

Closed Mandaryn closed 7 years ago

Mandaryn commented 8 years ago

I reach a point when I fetch collections of thousands entries from one endpoint for account. With current code https://github.com/BookingSync/synced/blob/master/lib/synced/strategies/full.rb#L172, this is all now firstly fetched to memory and then saved to database, which is bound to cause troubles.

We need to be able to save each page separately to database, to avoid bloating memory usage sky high (and running out of memory for larger endpoints)

@Azdaroth @wijet @Fredar @p55 @ZenCocoon wdyt?

ZenCocoon commented 8 years ago

I think that's a concern of bookingsync-api auto pagination then https://github.com/BookingSync/bookingsync-api/blob/e82d15815871f6c6017700ad9d5322796c1db61b/lib/bookingsync/api/client.rb#L186

The question is then: shall we run this in a transaction or not. Having a transaction would ensure that we save all or nothing, which helps to keep a sane state, however that would still have the same memory issue.

@Mandaryn What would be your approach to fetch and save in chunk while making sure we don't stay in a middle state which could cause issues with updated_since for example?

Mandaryn commented 8 years ago

i think scoping everything in transaction while dealing with each page separately would allow for better memory usage, each page would be handle separately in ruby and get GC if necessary. Only the transaction would be large. The only possible drawback i see it that the sync transaction would be longer now, (spanning multiple requests to core instead of just single big write), and could cause some deadlock issues more often (but i think transactions in rails don't lock tables for inserts)

ZenCocoon commented 8 years ago

spanning multiple requests to core instead of just single big write

It's already doing multiple requests so that won't be longer or anything, just preparing the transaction in small blocks.

About the deadlocks, I don't really know the impact from one solution to the other here. @Azdaroth do you know more about this?

Azdaroth commented 8 years ago

I wouldn't really go with making transactions longer, there are already problems with that and deadlocks happening quite often. Generally speaking, long transactions are not a good thing. Maybe we could run one transaction for each page separately? And the synced_all_at would be assigned at the end after fetching data from all pages and persisting them.

Azdaroth commented 8 years ago

@ZenCocoon Basically, the longer transaction is, the higher possibility of the deadlock. On the other hand, we could prevent syncing same models at the same time, which would make deadlock not really possible to happen, at least the ones caused by syncing which are 100% issues with synced in bsa-apps I think.

ZenCocoon commented 8 years ago

ok, then 1 transaction per page seems wiser. Little harder to handle errors in between pages but seems most optimized to lower deadlocks and memory usage.

Mandaryn commented 8 years ago

wrapping all in transaction should not be a problem, rails transactions don't lock tables as far as i know. would probably only fail if 2 syncs for the same are done in parallel, but that would fail anyway now

Fredar commented 8 years ago

The 2 syncs for the same scope are exactly the problem now. Sounds like we will need this one https://github.com/BookingSync/synced/issues/29

thilonel commented 7 years ago

76 closed this