gessnerfl / fake-smtp-server

A simple SMTP Server for Testing purposes. Emails are stored in an in-memory database and rendered in a Web UI
Apache License 2.0
426 stars 88 forks source link

java.lang.OutOfMemoryError: Java heap space #282

Closed matlewan closed 6 months ago

matlewan commented 1 year ago

We use fake-smtp-server (version 2.1.2) since July 2023. From this time, we have problem with Java heap space 2 times so the problem is repeatable. It happens after some period of time (2-4 weeks) after restart. I attach console.log file. Restart causes the problem disappear but we don't want to crush microservices (dependent on that application) every 2-4 weeks. Our configuration:

# https://github.com/gessnerfl/fake-smtp-server
server.port=25002
server.servlet.context-path=/fake-smtp-server
management.server.port=25002
management.endpoint.health.show-details=always 
fakesmtp.bindAddress=127.0.0.1
fakesmtp.port=25001
fakesmtp.persistence.maxNumberEmails=100
# fakesmtp.blockedRecipientAddresses=blocked@example.com, foo@eample.com
fakesmtp.requireTLS=false
fakesmtp.forwardEmails=false

We are not using docker. It is simple java process run on linux machine (java -jar ...). Java version: jdk 17.0.6+10

What causes problem? We ask for fixing this or at least shutdown process if this error (Java heap space) occurs console.log

btw. "Baza danych została zamknięta" is in polish and means "Database has been closed / shut down"

etidahouse commented 1 year ago

Hello @mateuszdev Do you also see a issue with the CPU? similar to this issue https://github.com/gessnerfl/fake-smtp-server/issues/199

Hello @gessnerfl Is there a potential issue with SMTP connections not being properly closed after sending an email, given the observed increase in CPU usage and memory consumption leading to a Java heap problem during tests?

matlewan commented 1 year ago

I didn't notice it before but to some extent : yes. Normally our CPU usage is less than 5-10%. But at that time it was 20%. We can observe significant increase of CPU usage. However, I am not sure if this is caused by fake-smtp-server because we have a lot of other applications. image image

gessnerfl commented 1 year ago

@mateuszdev @etidahouse Currently I have very limited time too check this issue in more detail. However, I also assume that there is some resource leak in the application causing these issues.

gessnerfl commented 11 months ago

I created a test setup to reproduce the issue. I run the latest version 2.1.3 on an AWS EC2 instance (t4g.small ARM64 2 VCPU, 2GB RAM) with 60% memory allocation for the JVM. In Addition to this I send every 10min 10 emails to the server. This is the server configuration:

server:
  port: 8080
  shutdown: graceful

management:
  server:
    port: 8081
  endpoints:
    web:
      exposure:
        include: '*'

spring:
  profiles:
    active: default

  datasource:
    url: jdbc:h2:mem:mail
    username: admin
    password: Test1234
    driver-class-name: org.h2.Driver

  jpa:
    hibernate:
      ddl-auto: validate

  data:
    web:
      pageable:
        size-parameter: size
        page-parameter: page
        default-page-size: 10
        one-indexed-parameters: false
        max-page-size: 1000

  h2:
    console:
      enabled: true

  mvc:
    hiddenmethod:
      filter:
        enabled: true

  jackson:
    serialization:
      write-dates-as-timestamps: false

springdoc:
  swagger-ui:
    path: /swagger-ui.html

fakesmtp:
  port: 8025
  persistentce:
     maxNumberEmails: 100

So far memory is on a very low level (between 128MB and 220MB). CPU is idling all time. I configured the JVM to create a Heap Dump on OutOfMemoryErrors.. I also installed a monitoring solution to gather near real time metrics.

@mateuszdev / @etidahouse Can you please also share more details about your setup (CPU, Memory, Heap, Non Heap Memory) and if possible provide a heap dump in case of out of memory errors.

etidahouse commented 11 months ago

Hello @gessnerfl

I created a project with a test (using the same components we use in our project) https://github.com/etidahouse/fake-smtp-server-cpu-issues

You can launch the container via sh/dev, install the dependencies via yarn and run a simple test that sends an e-mail via yarn test

If I run docker stats in another terminal, I see that the CPU never decreases

image
gessnerfl commented 11 months ago

@etidahouse I was able to reproduce the issue based on your provided container. However the root cause is still not identified. The main differences in your example are:

  1. Authentication is enabled
  2. Node is added to the container and the test case is executed from there

I extended my use case with authentication but still cannot reproduce the issue. I will adopt the mail sending to your code to ensure that it is not related to a broken client implementation. I also will try to reproduce it from my local machine without docker to be able to profile the issue.

gessnerfl commented 11 months ago

@mateuszdev I just released version 2.1.4. It provides a solution to terminate email sessions when the client stops providing data and a QuitCommand is not sent. The fix is more aimed at fixing #199, but could also address this increased memory usage issue. However, since I haven't been able to reproduce your issue so far, I need your feedback and more details in case the issue is not resolved with version 2.1.4.

programmer0121 commented 10 months ago

@gessnerfl Hi, sorry for late response. Here is dump file in attachment. Hope it will help to fix the problem. fake-smtp-server-dump.zip

gessnerfl commented 8 months ago

@programmer0121: I still cannot reproduce the issue and based on the heap dump the server consumed only about 25MB Heap. Can you please provide more details about how the server is being executed and the system in general.

gottschd commented 6 months ago

Hello @gessnerfl hello together,

I (we) face similar issues during our small load/performance testing. We send around 15 mails per second to the service and are seeing OOME after x minutes depending on the configured heap. The more heap the longer the wait time for the OOME (of course). I have also seen the "Database has been closed / shut down".

I tried 512MB, 1024MB and even 3GB as heap config (Xmx).

With the later, it took around 45min to see the OOME. The EmailRententionTimer logs how much emails were cleaned. And then later the OOME happens: grafik

Here are some screenshots of MAT (unfortunally i'm not allowed to share the hprof file) 1) grafik 2) grafik 3) grafik

The behaviour is the same regardless where the service is running. On my side, it happens in docker (resp. docker-compose via WSL) as well as when the service is started with an IDE (i'm using intellij). Locally i'm using Windows 11, with openJDK Temurin-21.0.1+12

gessnerfl commented 6 months ago

Hi together

Thanks @gottschd for sharing the detailed analysis.

it seems the issue is related to the embedded H2 database. I will again cross check the code, but I guess there is not much I can do to fix the issue.

Best Florian

gottschd commented 6 months ago

@gessnerfl Is it possible that the foreign key declaration in line 38 of the file db/migration/V1_1_0__initial_table_structure.sql must be: ALTER TABLE email_inline_image ADD FOREIGN KEY (email) REFERENCES email(id) ON DELETE CASCADE; instead of: ALTER TABLE email_attachment ADD FOREIGN KEY (email) REFERENCES email(id) ON DELETE CASCADE; ?

Edit: Yes, much better for my case. My memory gets cleaned up now with this as a local change. I could have told you much sooner that my emails are containing embedded images.

console log: grafik

VisualVM memory graph grafik

gessnerfl commented 6 months ago

@gottschd indeed this might be the root cause as inline images might not be deleted. I will provide a today.

gessnerfl commented 6 months ago

Version 2.2.1 is now released. Would be great to get feedback if this resolved the issue.

gottschd commented 6 months ago

I can confirm that my OOME issue (with embedded images in the emails) is solved with Version 2.2.1 when starting the application locally (via IDE) as well as with the docker image (via docker compose in WSL2).

I only had to configure sufficent -Xmx values oriented on my load.

Thank you very much.

gessnerfl commented 6 months ago

I'm closing the issue after confirming the successful fix in version 2.2.1