Can we speed up the refresher stage by seperating refresh and reload into 2 steps?

IATI / refresher

A Python application which has the responsibility of tracking IATI data from around the Web and refreshing the core IATI software's data stores

GNU Affero General Public License v3.0

2 stars 0 forks source link

Can we speed up the refresher stage by seperating refresh and reload into 2 steps? #268

Open odscjames opened 1 year ago

odscjames commented 1 year ago

Currently, the service_loop() Runs

refresh() in single thread
THEN reload multi thread
THEN It sleeps a minute, I guess to avoid hammering registry API

The problem is that a long reload process prevents refresh data coming in.

Can we separate these into 2 service loops and 2 stages?

I don't think one depends on the other?

akmiller01 commented 1 year ago

They could be separated, but I think we might also need to make sure there couldn't be a race condition between a document being picked up by refresh while it's in the middle of downloading by reload.

odscjames commented 1 year ago

Is that a current concern anyway - a race condition between the refresh and any of the other later stages (eg validate, solrize)?

akmiller01 commented 1 year ago

Not to my knowledge. Most steps have a flag or a timestamp that indicates the end of processing, and the subsequent steps wait for that flag or end timestamp before picking the file up. My worry would be that reload can be so long running, that a publisher could update a file in the middle of a reload running. So refresh would update the database with modified, and then reload would overwrite it with downloaded despite not picking up the new modified.