Changed the '_lines' data structure from Appender to built-in array, as the former leaks memory

atilaneves / dpp

Directly include C headers in D source code

Boost Software License 1.0

230 stars 31 forks source link

Changed the '_lines' data structure from Appender to built-in array, as the former leaks memory #227

Open cbecerescu opened 4 years ago

cbecerescu commented 4 years ago

For example, when translating kernel header files, memory usage for d++ reached 6-7GB on an 8GB machine (and we no longer could spawn a process for CPP to preprocess the .tmp file).

After seeing this issue, I changed from Appender to built-in array type and got a max RAM usage of 2GB (and CPP ran successfully).

Regarding performance, Edi ran some tests and saw no major difference between the Appender and the built-in array (test done on 1 million 20 char-long strings being appended, Appender being faster by 20ms).

edi33416 commented 4 years ago

The pseudo-benchmark code that I wrote

import std.array;                                                                                                                                                                                                                                                                         
import std.random;

void main()
{
    //Appender!(string[]) arr;
    string[] arr;

    int n = 1_000_000;

    for (int i = 0; i < n; ++i)
    {   
        string s;
        for (int j = 0; j < 20; ++j)
        {
            char c = cast(char) rndGen.front;
            rndGen.popFront();
            s ~= c;
        }

        arr ~= s;
    }   
}

I felt that using random and rndGen to generate the random strings is nice as that also requires some processing so we also "simultate" some work.

atilaneves commented 4 years ago

The performance considerations need to be measured with real-world usage in dpp, not in an external micro benchmark. The reason it's an appender in the first place is because of how long it was taking to process Python.h. I suggest measuring how it affects performance by running dpp on a single-line dpp file with #include "Python.h".

Laeeth commented 4 years ago

We perhaps should fix Appender !