Senipah / VBA-Better-Array

An array class for VBA providing features found in more modern languages
MIT License
112 stars 18 forks source link

Out of string memory error [170+ MB CSV] #6

Closed ws-garcia closed 3 years ago

ws-garcia commented 3 years ago

Is your feature request related to a problem? YES

Please describe. Dear @Senipah, it’s me, AGAIN! I was testing the CSV Interface class trying to find the memory limits of it and, in the same path of work, check if the change from String array to Variant jagged array has the potential to make the class run out of memory for a CSV file of couple of megabytes of size. I used both your solution and sdkn's as benchmarks. The test’s objective was invoking the import procedure, bypassing the result in the case of @sdkn104, and avoiding copies of arrays in the case of your project and mine, so that the imported data was only present in the class members. As a result of the above, I get a surprising error when invoke yours FromCSVString method. It’s about the error 14: “Out of string space”.

Describe the solution you'd like The question here is only one: it’s possible to deal with the memory consumption, when working with String data type, for your solution?

Additional context Attached to this I let you the file used as “guinea pig” (Demo_1.6MM_records.zip), hoping you can find the error source, and, additional to that, some screen shoots with the memory usage for both, @sdkn104 and mine, solutions.

NOTE: due to file size, the Demo_1.6MM_records.zip contains a .7z file inside.

Out of string space

sdkn Memory usage

CSV Interface Memory usage

Senipah commented 3 years ago

I am able to parse that file using my FromCSVFile method. See the below screenshot: image

If you upload the file you were testing in for me to take a look at I will do so but at the moment its a "works for me" situation, especially if I can't repro and it sounds like you have edited my class in some way(?).

If I cant reproduce I'll have to close the issue but as I say happy to take a look at any file you have.

Senipah commented 3 years ago

I assume your issue is that the peak memory usage exceeds 4GB. I assume you are running a 32Bit application and therefore run out of available RAM?

Untitled

ws-garcia commented 3 years ago

it sounds like you have edited my class in some way(?)

Your class module is untouched.

Here is the code I'm using

Sub SENIPAHtest(ByVal filePath As String)
    Dim Senipah As BetterArray
    Dim csv As CSVinterface
    Dim CSVdata As String
    Dim avg As Double
    Dim e As Double, f As Double

    'Test SENIPAH method
    Set csv = New CSVinterface
    CSVdata = csv.GetDataFromCSV(filePath)
    Set csv = Nothing
    Set Senipah = New BetterArray
    avg = 0
    e = Timer
    Call Senipah.FromCSVString(CSVdata)
    f = Timer
    avg = avg + CDbl(f - e)
    Set Senipah = Nothing
    Debug.Print "[Senipah FromCSVString:"; Round(avg, 4); "seconds]"
End Sub

And here using the FromCSVFile method

Sub SENIPAHtest(ByVal filePath As String)
    Dim Senipah As BetterArray
    Dim avg As Double
    Dim e As Double, f As Double

    'Test SENIPAH method
    Set Senipah = New BetterArray
    avg = 0
    e = Timer
    Call Senipah.FromCSVFile(filePath)
    f = Timer
    avg = avg + CDbl(f - e)
    Set Senipah = Nothing
    Debug.Print "[Senipah FromCSVString:"; Round(avg, 4); "seconds]"
End Sub

Now I'm getting a "Out of memory error" on a machine running Win 10 x64, with 8GB RAM, Excel 2019 32bits. I'm still surprised!

Out of memory

Senipah Memory usage

ws-garcia commented 3 years ago

I assume your issue is that the peak memory usage exceeds 4GB. I assume you are running a 32Bit application and therefore run out of available RAM?

Untitled

This can be the problem here.

ws-garcia commented 3 years ago

The question here is only one: it’s possible to deal with the memory consumption, when working with String data type, for your solution?

I think there is nothing to do with my issue!

Senipah commented 3 years ago

Alright, so it sounds like the issue is that because mine uses more auxiliary memory while parsing that 1.6m row file than is available to 32Bit Excel.

Out of interest I converted mine to return a string array rather than store a variant array in the class and that did bring it's peak usage below 4GB. Downloadable here: StringCSVTest.zip Untitled

While my solution uses more auxiliary memory than yours and the other solution you benchmark against I am a bit unwilling to refactor this method any more at this time as I think that the chance of anyone in the real world seeking to parse a file that large using my class is both unlikely and ill-advised. As you know, CSV parsing isn't my library's main focus: it exists as a utility, and it does that job acceptably at the moment I think.

That said, If you have any suggestions or want to submit a PR to try and improve the performance of my parser (while maintaining the functionality of the class as a whole) I am definitely willing to review.

ws-garcia commented 3 years ago

and it does that job acceptably at the moment I think.

Your parser do the parsing task in a outstanding way! The variations in the CSV syntax don't affect its performance and, for me, this is stunning!