developmentseed / chip-n-scale-queue-arranger

Chip 'n scale: Queue Arranger helps you run machine learning models over satellite imagery at scale
MIT License
37 stars 7 forks source link

Add stream backpressure for asynchronous call resolution. #1

Closed sharkinsspatial closed 5 years ago

sharkinsspatial commented 5 years ago

@wronk If possible can you run a test on this branch to verify that this handles the memory growth issue you were experiencing. You can use the PROMISE_THRESHOLD to specify the maximum number of inflight SQS messages (you'll have to balance the number of parallel messages with the memory and IO restrictions here to find the optimal number. The default is 500).

wronk commented 5 years ago

@sharkinsspatial, thanks for the progress. I'm just getting one odd problem.

I'm testing with 1 million tile indices. The upload yarn sqs-push script runs without issue and says it uploaded all 1 million tiles. However, I see that a small and consistent number of messages are missing if I look at the SQS page in AWS console. Repeating the upload a couple times, I only ever see 996,500 available messages.

I didn't deploy the entire CFN pipeline -- I just made an SQS queue and tested out this specific functionality. I'm also not experienced with the finer controls on SQS queues. You might want to check to see if the queue config is off and would've led to that error. I the queue on AWS (under the name chip-n-scale-test) if you want to have a look. I can also send you the text file of tile indices if you want to use it.

sharkinsspatial commented 5 years ago

@wronk I added a commit which should fix this situation. This was a bug on my part, the writeable stream was correctly exerting backpressure but I was failing to capture the streamed records the first time the PROMISE_THRESHOLD was exceeded. I also added some better error recording around sendMessageBatch as it will always resolve with a 200 even if individual messages fail to insert, these individual message failures should be logged now if there is ever a message failure.

wronk commented 5 years ago

@sharkinsspatial, LGTM, confirmed that I can upload and see all the messages in the queue