Open ShlomoNus opened 2 years ago
Are you using the database-backed query store or are you relying on the in-memory one?
yes. we are using PostgreSQL-based Daml ledger implementation. we are using the release 1.14.0 : daml-on-sql-1.14.0.jar, http-json-1.14.0.jar, trigger-service-1.14.0.jar
Please note that 1.14 is quite dated and I would recommend upgrading to a later version: there's a chance this is already fixed and if it's not it's unlikely we're going to back port the fix to 1.14.
If by any chance the issue persists with 1.18 (the latest stable version as of the time of writing), I have a few extra questions that could help us understand how this happens:
thanks for quick response Stefan, we will upgrade. I found out from our team lead we are not using any db for caching queries, in memory only, does it make any difference?Thank you in advance
thanks for quick response Stefan, we will upgrade. I found out from our team lead we are not using any db for caching queries, in memory only, does it make any difference?Thank you in advance
Hi @ShlomoNus, yeah there is a big difference between using the in-memory store vs the db store. I highly recommend using the db store to avoid running into the OOM error because the store will inevitably only grow over time with the db most likely having much more memory available :wink:
thanks for quick response Stefan, we will upgrade. I found out from our team lead we are not using any db for caching queries, in memory only, does it make any difference?Thank you in advance
Thanks for getting back to us, I heavily suspect that using the in-memory implementation is the source of the issue, as mentioned bu @realvictorprm in the previous message. If you want to keep using that, one approach is to expect them to blow up in size and use an orchestration system like Kubernetes that manages live replicas (blunt, but works). This also means that you lose the cache over the restart and you'll need to recover state from the ledger, which depending on your application, could also mean that the application will simply keep crashing as the state needed by the application exceeds the limits of your JVM's memory.
The cleaner solution is to use a database-backed query store, which could also give you better performance thanks to the indexing performed by the database (but this is not necessarily guaranteed and heavily depends on your workload).
The query store setup and configuration is described on our documentations (link).
I'll close the issue for now, but please do let us know if the issue still occurs using the database-backed query store.
Following your suggestions, we have upgraded to version 1.18.0 and configured a PostgreSQL backend as a cache but we are still having the same problem. I have attached a few photos which show the error logs, metrics for Gauges, and the memory utilization and CPU utilization which keeps track of changes in http-json API after the changes we did. Thank you in advance .
Thanks for the detailed report, we'll see if there's something that clearly stands out and can be fixed quickly or prioritize this after the next release if it requires more work.
I just noticed that it looks like you're only giving ~0.5 GB to the process. Are you sure this is enough? Have you tried giving the HTTP JSON API process more memory?
(I said "I noticed" but I should really point out that @da-tanabe did, thanks 🙇🏻)
@umachc @shiranA90
Please see the following response from @stefanobaghino-da:
"In order for memory to be reclaimed a garbage collection event must be triggered. The JVM does not trigger garbage collection events all the time (or possibly ever) if the memory reserved to the process is sufficient. If you monitor the JVM process, the memory usage will simply expand until the maximum amount is reached.
If you measure the usage from within the JVM, you should see a completely normal "sawtooth" shaped memory usage pattern in which the memory fills up until the maximum is reached and then drops after a garbage collection event. If you want to put my claim to the test, just keep running your test for an extended period of time."
Support Tip: If you could post your text in as Text, not as Text-in-Screenshot, that would be helpful. This way we can easily Copy&Paste it into other information systems as needed, to solve your issue faster.
Affected Daml version
1.14 version.
Bug description
we are using Daml about a year, and lately we have a bug with http json as mentioned in the title, we did not added any changes to ledger and it's traffic, so we find it hard to figure the reason for the bug, do you have any Idea why that may be happening ?
Expected behavior
http json keep working as before.
Additional context
We are using AWS cloud platform.