AviSynth / AviSynthPlus

AviSynth with improvements
http://avs-plus.net
930 stars 74 forks source link

String concatenation memory leak #389

Open flossy83 opened 4 months ago

flossy83 commented 4 months ago

As referenced here

ColorBars().ConvertToYV12().KillAudio()

ScriptClip(last, "Leak(last, current_frame)", after_frame=true, local=false)

function Leak(clip c, int current_frame){

##################################### doesn't leak ####################################
/* string = 
\ "stringstringstringstringstringstringstringstringstringstringstringstringstringstring
\  stringstringstringstringstringstringstringstringstringstringstringstringstringstring
\  stringstringstringstringstringstringstringstringstringstringstringstringstringstring
\  stringstringstringstringstringstringstringstringstringstringstringstringstringstring
\  stringstringstringstringstringstringstringstringstringstringstringstringstringstring
\  stringstringstringstringstringstringstringstringstringstringstringstringstringstring
\  stringstringstringstringstringstringstringstringstringstringstringstringstringstring
\  stringstringstringstringstringstringstringstringstringstringstringstringstringstring
\  stringstringstringstringstringstringstringstringstringstringstringstringstringstring
\  stringstringstringstringstringstringstringstringstringstringstringstringstringstring
\  stringstringstringstringstringstringstringstringstringstringstringstringstringstring
\  stringstringstringstringstringstringstringstringstringstringstringstringstringstring
\  stringstringstringstringstringstringstringstringstringstringstringstringstringstring
\  stringstringstringstringstringstringstringstringstringstringstringstringstringstring" 
*/

#################################### leaks 1MB/sec ####################################
string = 
\ "string" + "string" + "string" + "string" + "string" + "string" + "string" + "string" +
\ "string" + "string" + "string" + "string" + "string" + "string" + "string" + "string" +
\ "string" + "string" + "string" + "string" + "string" + "string" + "string" + "string" +
\ "string" + "string" + "string" + "string" + "string" + "string" + "string" + "string" +
\ "string" + "string" + "string" + "string" + "string" + "string" + "string" + "string" +
\ "string" + "string" + "string" + "string" + "string" + "string" + "string" + "string" +
\ "string" + "string" + "string" + "string" + "string" + "string" + "string" + "string" +
\ "string" + "string" + "string" + "string" + "string" + "string" + "string" + "string" +
\ "string" + "string" + "string" + "string" + "string" + "string" + "string" + "string" +
\ "string" + "string" + "string" + "string" + "string" + "string" + "string" + "string" +
\ "string" + "string" + "string" + "string" + "string" + "string" + "string" + "string" +
\ "string" + "string" + "string" + "string" + "string" + "string" + "string" + "string" +
\ "string" + "string" + "string" + "string" + "string" + "string" + "string" + "string" 

c
}
pinterf commented 4 months ago

Until a deeper investigation, as a memo: concatenation is done here: https://github.com/AviSynth/AviSynthPlus/blob/master/avs_core/core/parser/expression.cpp#L345

I suppose the "memory leak" (string heap increase) is more expressed when you have so many additions. The function VSprintf always makes a copy of the new values atop the string heap as seen here:

https://github.com/AviSynth/AviSynthPlus/blob/master/avs_core/core/avisynth.cpp#L1504

First "string"+"string", then "stringstring" + "string", then "stringstringstring" + "string", etc. are stored. The above example contains 104 elements, 6, 12, 18, 24, ... 624 bytes (over 32000 bytes plus overhead) are stored on the heap after the additions, and it seems that this is done in each function call during the function body evaluation.

flossy83 commented 4 months ago

Random idea: add a new Internal String Function like eg. CatStr(string) which uses different code to do the concatenation without without risking breaking any + operator functionality.

flossy83 commented 4 months ago

Oh Format can do concatenation:

  s1="string1"  s3="string3"
# cat = s1 + "string2" + s3 + "string4"
  cat = Format("{s1}{}{s3}{}", "string2", "string4") 

  s5="string5"  s7="string7"
# cat = cat + "\n" + s5 + "string6" + s7 + "string8
  cat = Format("{cat}{}{s5}{}{s7}{}", "\n", "string6", "string8") 

result:
string1string2string3string4
string5string6string7string8

Memory usage seems to depend on how you use it...

#################################### leaks 1MB/sec ####################################
cat = "string"
cat = Format("{cat}{}", "string")       cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")       cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")       cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")       cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")       cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")       cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")       cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")       cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")       cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")       cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")       cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")       cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")       cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")       cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")       cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")       cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")       cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")       cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")       cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")       cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")       cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")       cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")       cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")       cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")       cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")       cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")       cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")       cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")       cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")       cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")       cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")       cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")       cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")       cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")       cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")       cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")       cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")       cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")       cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")       cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")       cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")       cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")       cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")       cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")       cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")       cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")       cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")       cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")       cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")       cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")       cat = Format("{cat}{}", "string")
cat = Format("{cat}{}", "string")       cat = Format("{cat}{}", "string") 

#################################### leaks 20kB/sec ###################################
/* 
cat = Format("{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}{}",
\ "string", "string", "string", "string", "string", "string", "string", "string",
\ "string", "string", "string", "string", "string", "string", "string", "string",
\ "string", "string", "string", "string", "string", "string", "string", "string",
\ "string", "string", "string", "string", "string", "string", "string", "string",
\ "string", "string", "string", "string", "string", "string", "string", "string",
\ "string", "string", "string", "string", "string", "string", "string", "string",
\ "string", "string", "string", "string", "string", "string", "string", "string",
\ "string", "string", "string", "string", "string", "string", "string", "string",
\ "string", "string", "string", "string", "string", "string", "string", "string",
\ "string", "string", "string", "string", "string", "string", "string", "string",
\ "string", "string", "string", "string", "string", "string", "string", "string",
\ "string", "string", "string", "string", "string", "string", "string", "string",
\ "string", "string", "string", "string", "string", "string", "string", "string")  */
pinterf commented 4 months ago

Yep, it depends on when strings are pushed on the (always incrementing) heap. In the second - small leak - case the string constants already exist they are probably evaluated once, during function instantiating. There is one single result (cat=) that must be pushed on the heap (and the 8*13 "string" of course). Inside "Format" there is no extra memory penalty since the temporary results are c++ std::string and not real Avisynth "AVSValue".

The first - big leak - one is pushing each intermediate result on the string heap, since each evaluation is a new AVSValue which must be stored.

I had a deeper look into the parser, but I could not find a way how to recognise when multiple string constants are added together and replace them with a single multi-addition.

flossy83 commented 4 months ago

How come math operations like a = b + c don't leak? Are the numbers not pushed onto the heap as well?

I thought at the end of the ScriptClip all local variables would be cleared anyway since they are no longer accessible at the next frame (which is why I have to make all my runtime vars globals so they don't disappear at the next frame).

In any case I think using Format to join strings is a good enough solution and maybe it's not worth pulling apart the Avisynth core to try and get the root cause, unless you feel motivated enough to do it that is.

pinterf commented 4 months ago

Strings are pointers to buffers. Avisynth does not know if a pointer points to a buffer that exists forever (e.g. static constant in program code) or just exists in a temporary buffer. To be sure, Avisynth always has to make a safe copy from all strings that would be later used in script. Avisynth cannot determine by just a pointer if the specific memory area is volatile or not. The heap (better said: string store) is ever increasing. If this happens in runtime, with significant amount of new strings per frame than we'll see a more serious memory consumption.

Asd-g commented 2 months ago

The recent days I experienced this behavior. I had to store Max(RplaneMax, GPlaneMax, BPlaneMax) (1) for every frame and then sort the values ascending. The easiest way to sort values I found is RT_TxtSort (Function to sort a Chr(10) separated multiline string.) I stored the value (1) for every frame in such Chr(10) separated multiline string. The memory usage was acceptable for a clip with few hundred frames but the memory usage was ~24GB after ~70000 frames. I decided to store the values (1) in temporary file because then easily they can be loaded as a Chr(10) separated multiline string.

pinterf commented 2 months ago

I experimented with a string cache, which helps in the topic starter script.

Before

Frames processed:                   43460 (0 - 43459)
FPS (min | max | average):          373.7 | 1837 | 1665
Process memory usage (max):         1419 MiB

After

Frames processed:                   42140 (0 - 42139)
FPS (min | max | average):          369.9 | 1741 | 1626
Process memory usage (max):         60 MiB

The speed is a bit slower in this specific test. The extra check whether the strings are in the cache, require a little more time. Despite of this example is an artificial one, this solution probably hugely helps in such cases when most of the strings in ScriptClip are repeatable.

pinterf commented 2 months ago

Test build, x64, no commit yet, I'd just like to see how it is working on your side in your usual workflow. Crossposted to #379 See readme txt. https://drive.google.com/uc?export=download&id=1IznUhi6-7o8bRJoGHQsF6zAaBWJkKeNg

flossy83 commented 2 months ago

Test build, x64, no commit yet, I'd just like to see how it is working on your side in your usual workflow. Crossposted to #379 See readme txt. https://drive.google.com/uc?export=download&id=1IznUhi6-7o8bRJoGHQsF6zAaBWJkKeNg

I can report the synthetic test in the OP is resolved on my end.

As far as my real world usage goes the issue persists. Possibly as I'm concatenating in a different way inside my ScriptClips, eg. string = String(some_int) + "some text" + String(some_float) + "some more text". I'll try and provide you with a demo script if you're motivated to continue with this. I am satisfied with the Format workaround and have already converted all my scripts to use Format instead of +, but can help with this issue further if needed.

pinterf commented 2 months ago

Yes, in cases, where the string is built from formatted numbers, which are probably different for each frame, then my method would not help (unless integers 16-240 are involved, because they form a finite number of the possible strings). This change does not do any harm otherwise, and sometimes it helps a lot, so I'm gonna keep this mod. Thanks, I don't need more demo scripts, I see the problem of the other use cases.

pinterf commented 2 months ago

Anyway, I wonder if Asd-g's 24 GB@70000 frames has been decreased a bit or not.

Asd-g commented 2 months ago

No change.

Just for info:

FFVideoSource("Sony 4K HDR Demo - Camping in Nature.mp4")
Loop(20)
HDR10_Content_Metadata()
pinterf commented 2 months ago

Hi, downloaded the sample, despite the zip file name "Sony Camping In Nature HDR 4K Demo.zip", inside there is a "Sony Camp 4K Demo.mp4" file instead of "Sony 4K HDR Demo - Camping in Nature.mp4" you are using in the script. And the script exits with "HDR10_Content_Metadata: the clip must have _Transfer frame property 16.". I don't know if they are really different and you just renamed the original one? I'm using a latest https://github.com/FFMS/ffms2/releases/tag/5.0-RC3

Asd-g commented 2 months ago

Sorry, from here you can download Sony 4K HDR Demo - Camping in Nature.mp4.

This is the ffms2 version I use.

pinterf commented 2 months ago

Thanks, till then, as a workaround: https://avisynthplus.readthedocs.io/en/latest/avisynthdoc/script_ref/script_ref_arrays.html#arraysort

Asd-g commented 2 months ago

Thanks, I will test it.