drlight17 / mta-log-parser

A fast Python log parser for the most popular mail transfer agent's log files, with a simple WebUI for analyzing the logs
Other
20 stars 3 forks source link

Performance issue when runs large log database #17

Closed Thusithaaw closed 1 week ago

Thusithaaw commented 2 months ago

I installed the docker image in a host with 4xCPU and 16GB RAM. And inserted a postfix mail log file size around 250MB. When searching the log database it takes around 20 minutes for a single query. When I checked the CPU utilization I found only 1 CPU out of 4 utilizes 100%. How to optimize the CPU usage to distribute the load in a large log data environment?

drlight17 commented 2 months ago

I installed the docker image in a host with 4xCPU and 16GB RAM. And inserted a postfix mail log file size around 250MB. When searching the log database it takes around 20 minutes for a single query. When I checked the CPU utilization I found only 1 CPU out of 4 utilizes 100%. How to optimize the CPU usage to distribute the load in a large log data environment?

Hello! I have some counter questions:

For example, I've recently tested db queries on the server with ~120 message per hour for the last 90 days. It took about 30 seconds to query all messages for this period without any additional filters (about 250000 rows in output). With full text search (log lines) it took from 5 to 30 seconds depends on the search pattern (about 500 - 15000 rows in output).

Thusithaaw commented 2 months ago

Hi

Please find the answers for your questions.

  1. For what period of time is your 250 MB log file? The log file is for 2 days' mails log.
  2. What are the time periods for your queries? Searching query is for 1 day
  3. Are there any filters applied for such long queries? No
  4. What is your mail server average frequency of messages? Average 4000 per hour
  5. Do you use an SSD or HDD at your host with MLP? Using vSAN with 10k HDD

Thanks

drlight17 commented 2 months ago

Thank you for your answers. As I can see your mail server is quite highly loaded. I've never have a chance to test MLP in such conditions. I think the only thing you can do for now is to try SSD (something like intel optane or another enterprise grade) instead of HDD for the rethinkdb container. Maybe this will reduce query time. Amount of RAM is only affects the query output array size (in my case 6 GB of RAM is enough for the output of about 300000 rows). CPU cores are not fully utilized with custom datetime processing during parsing and importing in the db (I've mentioned this in example.env file for MAIL_LOG_TIMESTAMP_CONVERT variable). But it has no effect on the gui working while query is run.

drlight17 commented 1 week ago

No activity here. Closed.