GoSecure / pyrdp

RDP monster-in-the-middle (mitm) and library for Python with the ability to watch connections live or after the fact
https://www.gosecure.net/blog/2020/10/20/announcing-pyrdp-1/
GNU General Public License v3.0
1.5k stars 242 forks source link

PyRDP Convert to MP4: out of memory issues and corrupted video #352

Closed obilodeau closed 3 years ago

obilodeau commented 3 years ago

I can reliably make the pyrdp-convert.py tool go out of memory:

$ time pyrdp-convert.py rdp_replay_20210826_12-18-46_415_Kimberly835337.pyrdp -f mp4
[*] Converting 'rdp_replay_20210826_12-18-46_415_Kimberly835337.pyrdp' to MP4
 45% (241939 of 536390) |######################                           | Elapsed Time: 8:37:30 ETA:   6:00:53Terminated

real    517m44.110s
user    765m28.181s
sys 10m30.343s

My OOM logs say:

Aug 27 21:01:27 barachois earlyoom[1099]: mem avail:  3180 of 31959 MiB ( 9.95%), swap free: 6125 of 6143 MiB (99.69%)
Aug 27 21:01:27 barachois earlyoom[1099]: low memory! at or below SIGTERM limits: mem 10.00%, swap 100.00%
Aug 27 21:01:27 barachois earlyoom[1099]: sending SIGTERM to process 526823 uid 1000 "pyrdp-convert.p": badness 848, VmRSS 10362 MiB
Aug 27 21:01:27 barachois earlyoom[1099]: process exited after 0.1 seconds

So 10GB of RAM for a 1GB replay file:

$ ls -lh rdp_replay_20210826_12-18-46_415_Kimberly835337.pyrdp 
-rw-r--r-- 1 olivier olivier 973M Aug 26 21:18 rdp_replay_20210826_12-18-46_415_Kimberly835337.pyrdp

@alxbl created patches to mitigate these issues. I'll take a look at them.

Additionally, the converted MP4 is unplayable, its moov atom was not written out to disk so the resulting file is useless. I will try to address that problem separately by using a streaming container format instead of a fixed MP4.

obilodeau commented 3 years ago

https://ffmpeg.org/ffmpeg-formats.html#mov_002c-mp4_002c-ismv:

The mov/mp4/ismv muxer supports fragmentation. Normally, a MOV/MP4 file has all the metadata about all packets stored in one location (written at the end of the file, it can be moved to the start for better playback by adding faststart to the movflags, or using the qt-faststart tool). A fragmented file consists of a number of fragments, where packets and metadata about these packets are stored together. Writing a fragmented file has the advantage that the file is decodable even if the writing is interrupted (while a normal MOV/MP4 is undecodable if it is not properly finished), and it requires less memory when writing very long files (since writing normal MOV/MP4 files stores info about every single packet in memory until the file is closed). The downside is that it is less compatible with other applications.

Specifically:

it requires less memory when writing very long files

:open_mouth:

obilodeau commented 3 years ago

After video encoding changes but before I started backporting Alex's patches.

After a couple of minutes, the process started to consume around 800Mb. After 1 hour or so it was still around this use. After 3 hours it started increasing but sometimes it drops significantly.

Here's a screenshot after 4h30m at around 1.8GB RES:

2021-08-30_21-58

obilodeau commented 3 years ago

Memory pressure is increasing. I still think we have a leak in the python code or we need to call the GC manually.

2021-08-30_23-18

obilodeau commented 3 years ago

Houston, we have a leak

2021-08-31_00-26

obilodeau commented 3 years ago

Killed the process. At least we know we fixed the corrupted MP4 issue:

2021-08-31_00-31

Managed to have converted 5 hour of video.

obilodeau commented 3 years ago

I tried vprof but it doesn't generate a full profile if the application doesn't terminate successfully. I used it on a smaller conversion job but the output isn't useful:

2021-08-31_12-05

Then I tried memory_profiler but all it shows is memory decreasing:

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
   134  569.383 MiB -713.645 MiB        1029       @profile
   135                                             def writeFrame(self):
   136  569.383 MiB -413.707 MiB        1029           w = self.stream.width
   137  569.383 MiB -413.707 MiB        1029           h = self.stream.height
   138  569.383 MiB -387.512 MiB        1029           surface = self.imageHandler.screen.scaled(w, h) if self.scale else self.imageHandler.screen.copy()
   139                                         
   140                                                 # Draw the mouse pointer. Render mouse clicks?
   141  569.383 MiB -401.105 MiB        1029           p = QPainter(surface)
   142  569.383 MiB -401.340 MiB        1029           p.setBrush(QColor.fromRgb(255, 255, 0, 180))
   143  569.383 MiB -401.340 MiB        1029           (x, y) = self.mouse
   144  569.383 MiB -401.340 MiB        1029           p.drawEllipse(x, y, 5, 5)
   145  569.383 MiB -401.340 MiB        1029           p.end()
   146                                         
   147                                                 # Output frame.
   148  569.512 MiB -123.395 MiB        1029           frame = av.VideoFrame.from_ndarray(qimage2ndarray.rgb_view(surface))
   149  569.512 MiB -1327.379 MiB        2017           for packet in self.stream.encode(frame):
   150  569.512 MiB -767.219 MiB         988               if self.progress:
   151                                                         self.progress()
   152  569.512 MiB -766.703 MiB         988               self.mp4.mux(packet)

In mprof mode, it's not more useful: 2021-08-31_12-00 mprof

Going to try guppy next.

obilodeau commented 3 years ago

Gumpy is good but complicated:

2021-08-31_14-05 gumpy

obilodeau commented 3 years ago

Tried Pympler without great results. The TK reference browser didn't work. The summary views were very similar to gumpy's and didn't speak for themselves.

Tried fil-profiler. Segfaulted and trying to get a backtrace from gdb annoyed me greatly (args, python dev symbols, etc.) so I let it go.

Looked at valgrind and it felt like an adventure (compile python, using specific tooling to avoid noise in valgrind, etc.) so I skipped.

Tried tracemalloc. Adding instrumentation around the ReplayConverter tight loop is what started giving tangible results (or so I hope they are reliable):

diff --git a/pyrdp/convert/ReplayConverter.py b/pyrdp/convert/ReplayConverter.py
index 0e66ccd..98f3a2b 100644
--- a/pyrdp/convert/ReplayConverter.py
+++ b/pyrdp/convert/ReplayConverter.py
@@ -26,8 +26,23 @@ class ReplayConverter(Converter):
                 print("The input file is already a replay file. Nothing to do.")
                 sys.exit(1)

-            for event, _ in progressbar(replay):
+            import tracemalloc
+            snapshot1 = tracemalloc.take_snapshot()
+            for i, (event, _) in enumerate(replay):
+            #for event, _ in progressbar(replay):
+
                 handler.onPDUReceived(event)

+                if i % 100 == 0:
+                    snapshot2 = tracemalloc.take_snapshot()
+                    top_stats = snapshot2.compare_to(snapshot1, 'lineno')
+
+                    print("[ 100 events later ... Top 3 memory differences ]")
+                    #for stat in top_stats[:10]:
+                    for stat in top_stats[:3]:
+                        print(stat)
+                    snapshot1 = snapshot2
+
+
             print(f"\n[+] Succesfully wrote '{outputPath}'")
             handler.cleanup()

and then calling the conversion using this shim:

#!/usr/bin/python3

from pathlib import Path
import tracemalloc

from pyrdp.convert.ReplayConverter import ReplayConverter

tracemalloc.start()

converter = ReplayConverter(Path('/home/olivier/Documents/gosecure/src/pyrdp/pyrdp_output/replays/rdp_replay_20210826_12-15-33_512_Stephen215343.pyrdp'),
                            '', 'mp4')
converter.process()

snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('lineno')

print("[ Top 10 ]")
for stat in top_stats[:10]:
    print(stat)

With the above we get a top 3 changes to memory consumption in the event processing tight loop every 100 events. This gave the following trace (with many many lines omitted for brievety):

[ 100 events later ... Top 3 memory differences: ]
/home/olivier/Documents/gosecure/src/pyrdp/pyrdp/ui/qt.py:62: size=7511 KiB (+1945 KiB), count=1894 (+496), average=4061 B
/home/olivier/Documents/gosecure/src/pyrdp/pyrdp/parser/rdp/orders/secondary.py:213: size=3753 KiB (+1458 KiB), count=947 (+248), average=4058 B
/home/olivier/Documents/gosecure/src/pyrdp/pyrdp/player/BaseEventHandler.py:234: size=47.6 KiB (+47.6 KiB), count=1 (+1), average=47.6 KiB
[ 100 events later ... Top 3 memory differences: ]
/home/olivier/Documents/gosecure/src/pyrdp/pyrdp/ui/qt.py:62: size=10.2 MiB (+2896 KiB), count=2628 (+734), average=4055 B
/home/olivier/Documents/gosecure/src/pyrdp/pyrdp/parser/rdp/orders/secondary.py:213: size=4963 KiB (+1210 KiB), count=1314 (+367), average=3868 B
/home/olivier/Documents/gosecure/src/pyrdp/pyrdp/player/BaseEventHandler.py:236: size=56.6 KiB (+56.6 KiB), count=1 (+1), average=56.6 KiB

Line 62 of qt.py is the bytearrray creation for the QImage buffer before it is being processed by the rle C extension:

            buf = bytearray(width * height * 2)    # <-- this is line 62
            rle.bitmap_decompress(buf, width, height, data, 2)
            image = QImage(buf, width, height, QImage.Format_RGB16)

Why would this part of the code keep growing? If buf is out-scoped, it should be garbage collected. Unless there's a reference to it that is held.

This rle C extension is a part of the code that has been there forever, that has bugs that we worked around in Python and that didn't get any amount of scrutiny. It felt like this was the culprit. To make sure I commented it out. I knew this would yield to an empty video but would memory consumption change?

It did:

[ 100 events later ... Top 3 memory differences: ]
/home/olivier/Documents/gosecure/src/pyrdp/pyrdp/convert/MP4EventHandler.py:150: size=61.3 KiB (+4256 B), count=998 (+67), average=63 B
/usr/lib/python3.9/tracemalloc.py:132: size=3648 B (+3648 B), count=1 (+1), average=3648 B
/usr/lib/python3.9/tracemalloc.py:136: size=0 B (-3648 B), count=0 (-1)
[ 100 events later ... Top 3 memory differences: ]
/home/olivier/Documents/gosecure/src/pyrdp/pyrdp/convert/MP4EventHandler.py:150: size=62.7 KiB (+1344 B), count=1017 (+19), average=63 B
/home/olivier/Documents/gosecure/src/pyrdp/pyrdp/ui/qt.py:64: size=12.2 KiB (+1344 B), count=223 (+24), average=56 B
/home/olivier/Documents/gosecure/src/pyrdp/pyrdp/convert/MP4EventHandler.py:149: size=32.1 KiB (+704 B), count=514 (+10), average=64 B

Qt is no longer a top memory changer and there are no signs of uncontrolled memory growth.

I read the documentation on Python C Extensions and realized that we need to manage the lifetime of the buffer variables passed to the rle.bitmap_decompress() function:

However, when a Py_buffer structure gets filled, the underlying buffer is locked so that the caller can subsequently use the buffer even inside a Py_BEGIN_ALLOW_THREADS block without the risk of mutable data being resized or destroyed. As a result, you have to call PyBuffer_Release() after you have finished processing the data (or in any early abort case).

-- https://docs.python.org/3/c-api/arg.html#c.PyArg_ParseTuple

Adding PyBufferRelease() on the memory addresses of input and output led to the corrupted video issue. The memory increase disappers but visible artifacts appear.

Bad:

2021-09-01_11-25 with rle mem fix - corrupted

Good:

2021-09-01_11-25 without rle mem fix - good

I thought using a passed buffer was not pythonic so I wrote an alternate implementation of the bitmap_decompress function that would return a managed bytearray and that would consider input (data) immutable. Here is the patch:

diff --git a/ext/rle.c b/ext/rle.c
index 17e3226..35dbdb2 100644
--- a/ext/rle.c
+++ b/ext/rle.c
@@ -24,6 +24,7 @@
 /* indent is confused by this file */
 /* *INDENT-OFF* */

+#define PY_SSIZE_T_CLEAN
 #include <Python.h>

 /* Specific rename for RDPY integration */
@@ -918,6 +919,41 @@ bitmap_decompress(uint8 * output, int width, int height, uint8* input, int size,

 /* *INDENT-ON* */

+static PyObject*
+bitmap_decompress_wrapper_ng(PyObject* self, PyObject* args)
+{
+       unsigned char* input;
+       Py_ssize_t input_len;
+       int width = 0, height = 0, bpp = 0;
+       unsigned char* dest;
+       int dest_size;
+       PyObject *result;
+
+       if (!PyArg_ParseTuple(args, "y#iii", &input, &input_len, &width, &height, &bpp))
+               return NULL;
+
+       dest_size = width * height * bpp;
+       dest = PyMem_Malloc (dest_size);
+       if (dest == NULL)
+       {
+               return PyErr_NoMemory();
+       }
+
+       if(bitmap_decompress((uint8*)dest, width, height, (uint8*)input, input_len, bpp) == False) {
+               return NULL;
+       }
+
+       result = PyByteArray_FromStringAndSize(dest, dest_size);
+
+       PyMem_Free (dest);
+       if (result == NULL)
+       {
+               return PyErr_NoMemory ();
+       }
+
+       return result;
+}
+
 static PyObject*
 bitmap_decompress_wrapper(PyObject* self, PyObject* args)
 {
@@ -936,6 +972,7 @@ bitmap_decompress_wrapper(PyObject* self, PyObject* args)
 static PyMethodDef rle_methods[] =
 {
      {"bitmap_decompress", bitmap_decompress_wrapper, METH_VARARGS, "decompress bitmap from microsoft rle algorithm."},
+     {"bitmap_decompress_ng", bitmap_decompress_wrapper_ng, METH_VARARGS, "decompress bitmap from microsoft rle algorithm."},
      {NULL, NULL, 0, NULL}
 };

Remember that everytime you make a change to ext/rle.c you need to recompile it with: pip3 install -U -e '.[full]'. This alternate method doesn't work and leads to the exact same corruption as when I added the PyBuffer_Release().

This is where I go crazy: I added code that ran bitmap_decompress and bitmap_decompress_ng and checked whether the two resulting buffers were equal. They were always equal! Patch:

diff --git a/pyrdp/ui/qt.py b/pyrdp/ui/qt.py
index 05e07c4..64bc367 100644
--- a/pyrdp/ui/qt.py
+++ b/pyrdp/ui/qt.py
@@ -60,8 +60,15 @@ def RDPBitmapToQtImage(width: int, height: int, bitsPerPixel: int, isCompressed:
     elif bitsPerPixel == 16:
         if isCompressed:
             buf = bytearray(width * height * 2)
+            # original
             rle.bitmap_decompress(buf, width, height, data, 2)
-            image = QImage(buf, width, height, QImage.Format_RGB16)
+            #image = QImage(buf, width, height, QImage.Format_RGB16)
+            # new
+            buf2 = rle.bitmap_decompress_ng(data, width, height, 2)
+            if buf != buf2:
+                # never triggered
+                import ipdb; ipdb.set_trace()
+            image = QImage(buf2, width, height, QImage.Format_RGB16)
         else:
             image = QImage(data, width, height, QImage.Format_RGB16).transformed(QMatrix(1.0, 0.0, 0.0, -1.0, 0.0, 0.0))

See the resulting video. I can share the replay file privately if there's interest.

https://user-images.githubusercontent.com/546325/131692190-7f078508-f43e-4868-a5de-879e6ef230d4.mp4

The eventual release of the buffer by python's GC seem to corrupt the GDI drawing cache. But how can this happen given that it's a new object (QImage) that is returned for caching!? I also tried to make an explicit full copy of the bytearray and this didn't work either. I'm out of options.

Maybe the rle code is right and my changes introduced a small mistake. The leak isn't a leak since it's required by the GDI cache and the leak would be in the GDI's cache logic.

obilodeau commented 3 years ago

The buffer must remain valid throughout the life of the QImage and all copies that have not been modified or otherwise detached from the original buffer. The image does not delete the buffer at destruction. You can provide a function pointer cleanupFunction along with an extra pointer cleanupInfo that will be called when the last copy is destroyed.

from: https://doc.qt.io/qtforpython-5/PySide2/QtGui/QImage.html

Tried to get the cleanupFunction working, it never worked. I don't think it's real for python. All I managed to do is segfault the interpreter. The documentation seems auto-generated from the C++ library so that would be an artifact of that.

I added a reference to the buffer to the Bitmap cache. Reverted the code to use the previous working rle function. Video renders are ok but memory usage is still problematic.

After looking for known QImage memory leaks for a while, I decided to do a minimal test:

import rle
import tracemalloc

from PySide2.QtGui import QImage

tracemalloc.start()
snapshot1 = tracemalloc.take_snapshot()

width, height = (800, 600)
for i in range(1000):

    buf = bytearray(width * height * 2)
    #rle.bitmap_decompress(buf, width, height, bytearray(120), 2)
    #image = QImage.fromData(buf, QImage.Format_RGB16)
    image = QImage(buf, width, height, QImage.Format_RGB16)

    if i % 100 == 0:
        snapshot2 = tracemalloc.take_snapshot()
        top_stats = snapshot2.compare_to(snapshot1, 'lineno')

        print("[ 100 events later ... Top 3 memory differences ]")
        for stat in top_stats[:3]:
            print(stat)
        snapshot1 = snapshot2

Without the rle call, the memory doesn't grow between the snapshots. However, with the rle call it does. We are back to a problem in rle.

Without rle:

[ 100 events later ... Top 3 memory differences ]
/usr/lib/python3.9/tracemalloc.py:135: size=0 B (-100 B), count=0 (-2)
/usr/lib/python3.9/tracemalloc.py:193: size=4224 B (+96 B), count=88 (+2), average=48 B
/usr/lib/python3.9/tracemalloc.py:125: size=2088 B (+72 B), count=29 (+1), average=72 B
[ 100 events later ... Top 3 memory differences ]
/usr/lib/python3.9/tracemalloc.py:135: size=200 B (+200 B), count=4 (+4), average=50 B
/usr/lib/python3.9/tracemalloc.py:115: size=2400 B (+80 B), count=30 (+1), average=80 B
/usr/lib/python3.9/tracemalloc.py:125: size=2016 B (-72 B), count=28 (-1), average=72 B
[ 100 events later ... Top 3 memory differences ]
/usr/lib/python3.9/tracemalloc.py:135: size=100 B (-100 B), count=2 (-2), average=50 B
/usr/lib/python3.9/tracemalloc.py:558: size=1416 B (-96 B), count=28 (-2), average=51 B
/usr/lib/python3.9/tracemalloc.py:126: size=56 B (+56 B), count=2 (+2), average=28 B

With rle:

[ 100 events later ... Top 3 memory differences ]
/home/olivier/gosecure/src/pyrdp/bin/qimage.py:18: size=642 MiB (+91.6 MiB), count=1402 (+200), average=469 KiB
/home/olivier/gosecure/src/pyrdp/bin/qimage.py:19: size=121 KiB (+17.3 KiB), count=1404 (+201), average=88 B
/usr/lib/python3.9/tracemalloc.py:558: size=1448 B (-96 B), count=29 (-2), average=50 B
[ 100 events later ... Top 3 memory differences ]
/home/olivier/gosecure/src/pyrdp/bin/qimage.py:18: size=733 MiB (+91.6 MiB), count=1602 (+200), average=469 KiB
/home/olivier/gosecure/src/pyrdp/bin/qimage.py:19: size=139 KiB (+17.2 KiB), count=1603 (+199), average=88 B
/usr/lib/python3.9/tracemalloc.py:126: size=196 B (+112 B), count=7 (+4), average=28 B
[ 100 events later ... Top 3 memory differences ]
/home/olivier/gosecure/src/pyrdp/bin/qimage.py:18: size=825 MiB (+91.6 MiB), count=1802 (+200), average=469 KiB
/home/olivier/gosecure/src/pyrdp/bin/qimage.py:19: size=156 KiB (+17.3 KiB), count=1804 (+201), average=88 B
/usr/lib/python3.9/tracemalloc.py:125: size=2232 B (+144 B), count=31 (+2), average=72 B
obilodeau commented 3 years ago

With the reference to the buffer in the BitmapCache, I tried again the rle.bitmap_decompress_ng hoping that the hold on that reference would prevent the corruption issues. It did! Videos are no longer corrupted!

Now, back to the initial memory issue, in the BitmapCache update I also added an explicit cache eviction on key updates. I'm not sure this is required. Let's benchmark.

Converting Stephen replay with BitmapCache (QImage, buffer), old rle code and explicit eviction on duplication:

[ Top 10 ]
/home/olivier/Documents/gosecure/src/pyrdp/pyrdp/ui/qt.py:68: size=46.6 MiB, count=11910, average=4102 B
/home/olivier/Documents/gosecure/src/pyrdp/pyrdp/parser/rdp/orders/secondary.py:213: size=17.5 MiB, count=5955, average=3085 B
/home/olivier/Documents/gosecure/src/pyrdp/pyrdp/convert/MP4EventHandler.py:148: size=90.8 KiB, count=1449, average=64 B
/usr/lib/python3.9/tracemalloc.py:558: size=74.7 KiB, count=1430, average=54 B
/home/olivier/Documents/gosecure/src/pyrdp/pyrdp/player/gdi/cache.py:71: size=59.3 KiB, count=426, average=142 B
/home/olivier/Documents/gosecure/src/pyrdp/pyrdp/convert/MP4EventHandler.py:147: size=45.9 KiB, count=731, average=64 B
/usr/lib/python3.9/tracemalloc.py:193: size=45.6 KiB, count=973, average=48 B
/usr/lib/python3.9/tracemalloc.py:115: size=38.0 KiB, count=487, average=80 B
/home/olivier/Documents/gosecure/src/pyrdp/pyrdp/ui/qt.py:70: size=27.3 KiB, count=499, average=56 B
/home/olivier/Documents/gosecure/src/pyrdp/pyrdp/parser/rdp/fastpath.py:410: size=15.9 KiB, count=1, average=15.9 KiB

Converting Stephen replay with BitmapCache (QImage, buffer), new rle code and explicit eviction on duplication:

[ Top 10 ]
/home/olivier/Documents/gosecure/src/pyrdp/pyrdp/ui/qt.py:72: size=4055 KiB, count=1005, average=4131 B
/usr/lib/python3.9/tracemalloc.py:558: size=67.1 KiB, count=1283, average=54 B
/home/olivier/Documents/gosecure/src/pyrdp/pyrdp/player/gdi/cache.py:71: size=59.2 KiB, count=425, average=143 B
/usr/lib/python3.9/tracemalloc.py:193: size=46.0 KiB, count=982, average=48 B
/usr/lib/python3.9/tracemalloc.py:115: size=37.9 KiB, count=485, average=80 B
/home/olivier/Documents/gosecure/src/pyrdp/pyrdp/convert/MP4EventHandler.py:148: size=27.4 KiB, count=436, average=64 B
/home/olivier/Documents/gosecure/src/pyrdp/pyrdp/ui/qt.py:73: size=27.3 KiB, count=499, average=56 B
/home/olivier/Documents/gosecure/src/pyrdp/pyrdp/parser/rdp/fastpath.py:410: size=15.9 KiB, count=1, average=15.9 KiB
/home/olivier/Documents/gosecure/src/pyrdp/pyrdp/convert/MP4EventHandler.py:147: size=13.8 KiB, count=219, average=65 B
/usr/lib/python3.9/abc.py:123: size=9766 B, count=97, average=101 B

A drastic reduction on top 2 consumers. A 15x reduction in memory allocations for this conversion job!

Converting Stephen replay with BitmapCache (QImage, buffer), new rle code and without the explicit eviction on duplication:

[ Top 10 ]
/home/olivier/Documents/gosecure/src/pyrdp/pyrdp/ui/qt.py:72: size=4055 KiB, count=1005, average=4131 B
/home/olivier/Documents/gosecure/src/pyrdp/pyrdp/convert/MP4EventHandler.py:148: size=90.8 KiB, count=1449, average=64 B
/usr/lib/python3.9/tracemalloc.py:558: size=74.3 KiB, count=1421, average=54 B
/usr/lib/python3.9/tracemalloc.py:193: size=45.9 KiB, count=980, average=48 B
/home/olivier/Documents/gosecure/src/pyrdp/pyrdp/convert/MP4EventHandler.py:147: size=45.9 KiB, count=731, average=64 B
/home/olivier/Documents/gosecure/src/pyrdp/pyrdp/player/gdi/cache.py:71: size=41.2 KiB, count=425, average=99 B
/usr/lib/python3.9/tracemalloc.py:115: size=37.7 KiB, count=483, average=80 B
/home/olivier/Documents/gosecure/src/pyrdp/pyrdp/ui/qt.py:73: size=27.3 KiB, count=499, average=56 B
/home/olivier/Documents/gosecure/src/pyrdp/pyrdp/parser/rdp/fastpath.py:410: size=15.9 KiB, count=1, average=15.9 KiB
/usr/lib/python3.9/abc.py:123: size=9759 B, count=97, average=101 B

Python does its job. Explicit eviction is not required. Dropping that code.

obilodeau commented 3 years ago

The code is in #353. I'm testing with a jumbo replay now.

obilodeau commented 3 years ago

After 3h50m of processing, RES memory is still only at 685M. I think we plugged the leak.

2021-09-01_21-07

However, there seems to be a regression in the CPU consumption...

obilodeau commented 3 years ago

A couple of minutes later, processing picked up and ETA looked better:

2021-09-01_21-23_1

obilodeau commented 3 years ago

After 6 hours, the pace of processing events picked up. Memory usage is still low. Partial video viewable for at least 3 hours.

2021-09-01_23-30

obilodeau commented 3 years ago

The conversion of an 11-hours long replay to MP4 completed succesfully this morning. I can watch the video, seek throught it. The conversion took 13.75 hours (21 CPU hours!) and I never saw RAM usage go beyond 685MB of RAM. The capture is 973MB and the resulting video is 3.4GB.

2021-09-02_10-18

2021-09-02_10-21

The player seem to have benefited from these improvements as well. However, seeking past 6h can get slow until you get to recoverable UI freezes. Once caught up, scaling + 10x speed replaying was smooth. RAM usage below 400MB.

obilodeau commented 3 years ago

Fixed with the merge of #353.