Closed avdv closed 5 years ago
Thank you for trying it out, that is definitely not how it's meant to be.
Hi,
I tried to pinpoint the problem with localstack and the throttling seems to be the culprit...
I have created a queue with 500 messages.
implicit val system: ActorSystem = ActorSystem()
implicit val mat: Materializer = ActorMaterializer()
implicit val awsSqsClient = SqsAsyncClient
.builder()
.endpointOverride(URI.create("http://localhost:4576"))
.region(REGION)
.build()
system.registerOnTermination(awsSqsClient.close())
val localstackUrl = "http://localhost:4576/queue/foobar"
SqsSource(
localstackUrl,
SqsSourceSettings()
.withMaxBufferSize(10)
.withCloseOnEmptyReceive(true)
)
// !!!
// .throttle(10, 1.minute)
.map { msg ⇒
println(s"received message: ${msg.messageId()}")
Thread.sleep(1000)
MessageAction.Ignore(msg)
}
.runWith(SqsAckSink(localstackUrl))
.onComplete {
case _ ⇒
println("terminating...")
system.terminate()
}(scala.concurrent.ExecutionContext.Implicits.global)
This may be a contrived example, of course. But the stream is running without problems with exactly 30 messages in flight (30 seconds visibility timeout).
As soon as I uncomment the throttle
operation, it seems to immediately demand most/all input from the source:
$ awslocal sqs get-queue-attributes --queue-url "http://localhost:4576/queue/foobar" --attribute-names 'All'
{
"Attributes": {
"VisibilityTimeout": "30",
"DelaySeconds": "0",
"ReceiveMessageWaitTimeSeconds": "0",
"ApproximateNumberOfMessages": "0",
"ApproximateNumberOfMessagesNotVisible": "501",
"ApproximateNumberOfMessagesDelayed": "0",
"CreatedTimestamp": "1552998772",
"LastModifiedTimestamp": "1552998772",
"QueueArn": "arn:aws:sqs:elasticmq:000000000000:foobar"
}
}
I'm having a similar issue also with the number of inflight messages and the messages not being deleted. I tried to remove the throttling but it didn't make a difference.
This consumer works fine on M1 but has issues with M3 and RC1.
Here is a gist of the consumer code. https://gist.github.com/martyphee/df632696796e3484205e486422c08450
@ennru Having a similar problem as @martyphee after updating to RC1, everything was OK with M1.
Consumer is getting stuck after some time (after ~10 - 15 hours) and doesn't consume available messages. No errors in log.
Edit:
I see from metrics, that consumer stops receiving Empty Receives
from SQS when this happens
Getting the same error and it seems Message class is being leaked -
top 5 object creation histogram using jhat -
Class | Instance Count | Total Size |
---|---|---|
class [C | 1376284 | 2068640582 |
class software.amazon.awssdk.services.sqs.model.Message | 332718 | 18632208 |
class java.lang.String | 1375675 | 16508100 |
class [Lakka.dispatch.forkjoin.ForkJoinTask; | 227 | 14880304 |
class scala.collection.immutable.$colon$colon | 334396 | 5350336 |
In my case, I am also doing something similar to what avdv@ is doing with MesssageAction.ignore and MessageAction.delete both. I also tried with MessageAction.changeVisibility along with MessageAction.delete.
Seems like Message object is never dereferenced. My Stream looks like below -
SqsSource(queueUrl,
//parallelism = maxBufferSize / maxBatchSize 20 10
SqsSourceSettings().withWaitTime(10 seconds)
.withMaxBatchSize(10).withMaxBufferSize(20)
).map {
msg => {
val out = Source.single(msg)
.via(messageToLambdaRequest)
.via(lambdaRequestToLambdaResp)
.via(lambdaRespToAggregationKeySet)
.via(workFlow)
.log("error while consuming events internally.")
.withAttributes(ActorAttributes.supervisionStrategy(decider))
.runWith(Sink.seq)
val reducedResponse = out.map(response => {
response.foldLeft[Response](OK)((a, b) =>
if (a == OK && b == OK) OK else NotOK)
})
val messageAction = reducedResponse
.map(res =>
if (res == OK) {
//log.info("finally response is OK hence message action delete is prepared. {}.", msg.messageId())
delete(msg)
} else
ignore(msg)
)
messageAction
}
}
.mapAsync(1)(identity)
.withAttributes(ActorAttributes.supervisionStrategy(decider))
// For the SQS Sinks and Sources the sum of all parallelism (Source) and maxInFlight (Sink)
// must be less than or equal to the thread pool size.
.log("error log")
.runWith(SqsAckSink(queueUrl, SqsAckSettings(1)))
}
I tried this with 1.0-M3 and 1.0-RC1 both. is there a work around of this?
Since the migration to AWS sdk #1450, the SqsSource
is eagerly polling and filling its internal buffer.
def inHandler: InHandler = new InHandler {
override def onPush(): Unit = {
val messages = grab(in)
if (messages.nonEmpty) buffer.offer(messages)
if (settings.closeOnEmptyReceive && buffer.isEmpty) completeStage()
if (!hasBeenPulled(in)) pull(in)
if (!buffer.isEmpty && isAvailable(out)) push(out, buffer.poll())
}
}
No maximum size is set on the buffer, so the flow will always pull after buffering. The rate-decoupled-graph-stages guideline must be followed.
Moreover, I don't get why this custom flow is required since it is directly followed by
.mapConcat(identity)
.buffer(settings.maxBufferSize, OverflowStrategy.backpressure)
To be honest, I don't know why I added that stage to the Source in the first place. I remember trying to use takeWhile
but I don't know why it didn't work out, so, I dropped it. I think I thought that by adding this buffer stage:
.buffer(settings.maxBufferSize, OverflowStrategy.backpressure)
(...) it'd not try to pull again and the backpressure would work as expected, but, I think the pull(in)
not controlling the max buffer size messed it all up. Sorry. =P
And now I see that there's kind of a problem with the approach I took by using pre-defined stages since mapAsync
is executed eagerly and more messages are getting fetched than the buffer predefined. (even with your fix, @RustedBones, tho it fixes the non stopping polling)
Also, since the mapAsync
is stateless, it cannot know how many messages it can poll from SQS based on the size of the current buffer, it always fetches based on maxBatchSize
setting. So, even if the buffer is set to 20 and the stream has 19 elements, it'll try to fetch more 10 messages, which is not what should happen.
I think the problem occurs because of mapAsync
+ mapConcat(identity)
+ buffer(settings.maxBufferSize, OverflowStrategy.backpressure)
all together. Even using mapAsync(1)
it sends more requests than expected. What I think is: while the buffer is not full, mapAsync
keeps being called. This way more messages are being grabbed than the maxBufferSize
setting.
Should I create an issue to talk specifically about it?
I am not sure mapAsync
is eagerly executed.
mapAsync
will reach the parallelism
only if downstream asks for parallelism
elements. It will then back-pressues after this point.
For instance, if paralellism
of map async is set to 1000 and downstream asks for 1 element, only 1 async call will be executed. Othewise the mapAsync
stage would not respect the reactive principle.
Edit: I've just checked the MapAsync
implementation. There is no eager pull from upstream on the stage preStart
.
In our case, we decoupled the back-pressure using a buffer
. It means that at initialization, the buffer
stage will ask upstream for maxBufferSize
elements. mapAsync
, will limit the number of parallel calls to settings.maxBufferSize / settings.maxBatchSize
which makes sense: No need to fetch more elements than the buffer can store.
Is some cases though, we may execute a ReceiveMessageRequest
for 10 elements even if we only have 1 slot left in the buffer. This is not really a problem, the 9 remaining elements will be back-pressured in the mapConcat
stage. So yes, the maxBufferSize
is not super accurate because some other elements can be internally buffered in mapConcat
but this does not seem to be an issue to me.
User must be attentive on the maxBufferSize
though. If their sqs-stream is slow, and the queue is configured with a short VisibilityTimeout
, users will process messages with receipt handle already invalidated because of the time the messages stayed in the buffer.
Versions used
Alpakka version: 1.0-M3
Akka version: 2.5.20
Amazon SQS library: 2.5.8
Expected Behavior
When starting a SqsSource I would expect that the flow handles a slow consumer, ie. it only pulls new messages when needed.
Actual Behavior
When starting up the flow, it seems to eagerly pull down all messages in the queue (probably multiple times, when the visibility timeout for messages in flight is reached).
Note, we were using https://github.com/s12v/akka-stream-sqs before, but tried to switch to alpakka-sqs.
Relevant logs
Reproducible Test Case
n/a