TPC-Council / HammerDB

HammerDB Database Load Testing and Benchmarking Tool
http://www.hammerdb.com
GNU General Public License v3.0
590 stars 119 forks source link

HammerDB 4 possibly leaking memory on Windows #227

Closed remi9898 closed 3 years ago

remi9898 commented 3 years ago

While HammerDB is running a TPROC-H single user test the HammerDB client process on Windows continues to grow. If the query runs long enough the Windows memory is exhausted and the query fails. When this occurs HammerDB writes an "out of memory" error to its log file. HammerDB then releases its memory and the next query runs as expected.

This was solely on my Windows client. The Linux server never came close to running out of memory or disk space. There were no error messages logged by Linux or the DBMS. Neither seemed aware of the issue. And, I don't think the HammerDB virtual user was disconnected from the database as the next query and all remaining queries ran as expected.

HammerDB seems to consume memory whenever a query is long running. The longer the query the more memory is consumed. Just now on query 10 (18 of 22) which took 1371 seconds to complete the HammerDB process grew to 3.9GB. Once the query completed it dropped back down to 24 MB. No failures in this case as my laptop has 8 GB memory.

HammerDB v4.0 on Microsoft Windows [Version 10.0.18363.1441] running TPROC-H scale 300 against a PostgreSQL 13.2 database on Ubuntu 20.04.02. HammerDB-4 0_leakonWindows

sm-shaw commented 3 years ago

Many thanks for the issue. This is interesting as the HammerDB code is identical on both Linux and Windows as is the PostgreSQL interface pgtcl2.1.1 in the lib directory - so the first difference will be in the PostgreSQL client library layer.

However, as a first step I would recommend taking a look at Issue #164 the related commit is here https://github.com/TPC-Council/HammerDB/pull/170/commits/a0ebcbd81a0969255b396e9af1e68d5455434a10 This fixes missing indexes on foreign keys in PostgreSQL and provides significant speed up on some queries. This will be in v4.1 however you can add the indexes in the commit straight into your schema.

If the issue still exists the next step is to run a problem query in PostgreSQL on both Linux and Windows at the psql command line without HammerDB. To do this select verbose output and logging and run HammerDB. This will log the query exactly as it was run. You can then extract the query from the log and run it at the psql prompt in both Linux and Windows. This test eliminates the HammerDB client. If the query still uses a lot of memory in Windows but not Linux then it is a PostgreSQL issue, if it doesn't then the next issue is to run the exact query from HammerDB in a bespoke script to have a reproducible test case to identify where the memory is going. However first see if adding the indexes help it could be expected behaviour as the work_mem parameter is set per session and this would be expected to grow.

remi9898 commented 3 years ago

Running test now in a psql session on the same Windows laptop. Seems to have same problem. Memory is growing at a rate of 1.5 MB/s and it passed 1 GB after a few minutes. Still growing.

I already had the PK, FK, and additional indexes before encountering the problem. I suppose a separate test would be to drop the additional indexes and re-run the query.

sm-shaw commented 3 years ago

If it does it at a psql prompt then this eliminates the HammerDB part of the stack and puts the issue in the PostgreSQL client. However, this could easily be expected behaviour as each session will allocate work_mem for join and sort operations and the TPROC-H workload has complex queries that can require a large amount of memory. It could be how this is handled differently in Windows and Linux although it does not look like a memory leak in HammerDB so far.

remi9898 commented 3 years ago

The issue seems partly due to the large result set being returned to the client and partly due to Google Chrome using too much memory on my Windows client. As you know each execution of a TPC-H-like query uses different predicate values resulting in varied result sets. For Q10 scale 300 the result set is generally between 2.5 GB and 3.5 GB (as verified by sending query results to file). I had one execution which took considerably longer and my psql process on Windows grew to 6.5 GB memory. The other part of this problem turned out to be Google Chrome consumed so much memory (over 3 GB) on my Windows client that psql didn't have enough free memory to receive the Q10 query result set. I think you can close this as not a problem.

remi9898 commented 3 years ago

It was a bit confusing when the HammerDB reported "out of memory" which I took to mean server memory or filesystem storage (the usual suspects). If it is possible to differentiate between the DBMS throwing an oom error versus the client reporting it, then that would be a "nice to have" feature.