OpenVisualCloud / SVT-HEVC

SVT HEVC encoder. Scalable Video Technology (SVT) is a software-based video coding technology that is highly optimized for Intel® Xeon® processors. Using the open source SVT-HEVC encoder, it is possible to spread video encoding processing across multiple Intel® Xeon® processors to achieve a real advantage of processing efficiency.
Other
519 stars 169 forks source link

Resource de/allocation refactor #467

Closed tianjunwork closed 4 years ago

tianjunwork commented 4 years ago

Signed-off-by: Jun Tian jun.tian@intel.com Signed-off-by: Xu Guangxin guangxin.xu@intel.com

tianjunwork commented 4 years ago

Debug build, command line used: -i C:\Users\tianjun\Desktop\media_tools\test_video\720x480_420_10bit_10.yuv -w 720 -h 480 -n 2000 -b out.bin -bit-depth 10

Total pick committed size dropped from ~3.77G to ~2.25G. Within which, image size increased from 18.7M to 215.5M, private data dropped from ~3.74G to ~2G.

Committed size with this PR: new

Committed size without this PR: old

tianjunwork commented 4 years ago

Below is the valgrind result, which an be compared with https://github.com/OpenVisualCloud/SVT-HEVC/issues/461.

The last 40 bytes is the global singleton gMallocMutex used for mem entry hash table, not sure how to free it. Since it is small size, I think it is ok.

valgrind --leak-check=full --show-leak-kinds=all --tool=memcheck ./SvtHevcEncApp -i ../../../../yuv/bbb_1920x1080_420p.y4m -b out.bin -n 5
==37108== Memcheck, a memory error detector
==37108== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==37108== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==37108== Command: ./SvtHevcEncApp -i ../../../../yuv/bbb_1920x1080_420p.y4m -b out.bin -n 5
==37108==
-------------------------------------------
SVT-HEVC Encoder
SVT [version]:  SVT-HEVC Encoder Lib v1.4.3
SVT [build]  :  GCC 8.1.0        64 bit
LIB Build date: Feb 12 2020 16:30:07
-------------------------------------------
==37108== Conditional jump or move depends on uninitialised value(s)
==37108==    at 0x4FACF26: EbVideoUsabilityInfoCtor (EbSei.c:39)
==37108==    by 0x4EEEA93: EbSequenceControlSetCtor (EbSequenceControlSet.c:104)
==37108==    by 0x4EEFF4E: EbSequenceControlSetInstanceCtor (EbSequenceControlSet.c:169)
==37108==    by 0x4E802EE: EbEncHandleCtor (EbEncHandle.c:557)
==37108==    by 0x4E95DC0: InitH265EncoderHandle (EbEncHandle.c:3790)
==37108==    by 0x4E88AB4: EbInitHandle (EbEncHandle.c:1554)
==37108==    by 0x11191D: InitEncoder (EbAppContext.c:622)
==37108==    by 0x112426: main (EbAppMain.c:162)
==37108==
==37108== Conditional jump or move depends on uninitialised value(s)
==37108==    at 0x4FACF39: EbVideoUsabilityInfoCtor (EbSei.c:38)
==37108==    by 0x4EEEA93: EbSequenceControlSetCtor (EbSequenceControlSet.c:104)
==37108==    by 0x4EEFF4E: EbSequenceControlSetInstanceCtor (EbSequenceControlSet.c:169)
==37108==    by 0x4E802EE: EbEncHandleCtor (EbEncHandle.c:557)
==37108==    by 0x4E95DC0: InitH265EncoderHandle (EbEncHandle.c:3790)
==37108==    by 0x4E88AB4: EbInitHandle (EbEncHandle.c:1554)
==37108==    by 0x11191D: InitEncoder (EbAppContext.c:622)
==37108==    by 0x112426: main (EbAppMain.c:162)
==37108==
Number of logical cores available: 72
Number of PPCS 62
-------------------------------------------
SVT [config]: Main Profile      Tier (auto)     Level (auto)
SVT [config]: EncoderMode / Tune                                                        : 7 / 1
SVT [config]: EncoderBitDepth / CompressedTenBitFormat / EncoderColorFormat             : 8 / 0 / 1
SVT [config]: SourceWidth / SourceHeight / InterlacedVideo                              : 1920 / 1080 / 0
SVT [config]: Fps_Numerator / Fps_Denominator / Gop Size / IntraRefreshType             : 30 / 1 / 32 / -1
SVT [config]: HierarchicalLevels / BaseLayerSwitchMode / PredStructure                  : 3 / 0 / 2
SVT [config]: BRC Mode / QP / LookaheadDistance / SceneChange                           : CQP / 32 / 17 / 1
SVT [config]: BitRateReduction / ImproveSharpness                                       : 0 / 0
SVT [config]: tileColumnCount / tileRowCount / tileSliceMode / Constraint MV            : 1 / 1 / 0 / 0
SVT [config]: De-blocking Filter / SAO Filter                                           : 1 / 1
SVT [config]: HME / UseDefaultHME                                                       : 1 / 1
SVT [config]: MV Search Area Width / Height                                             : 16 / 7
SVT [config]: HRD / VBV MaxRate / BufSize / BufInit                                     : 0 / 0 / 0 / 90
SVT [config]: More configurations for debugging:
SVT [config]: Channel ID / ActiveChannelCount                                           : 0 / 1
SVT [config]: Number of Logical Processors / Target Socket                              : 0 / -1
SVT [config]: Threads To RT / Thread Count / ASM Type                                   : 1 / 0 / 1
SVT [config]: Speed Control / Injector Frame Rate                                       : 0 / 60
SVT [config]: MaxCLL / MaxFALL / Output Reconstructed YUV                               : 0 / 0 / 0
SVT [config]: MasterDisplayColorVolume / DolbyVisionProfile                             : 0 / 0
SVT [config]: DisplayPrimaryX[0], DisplayPrimaryX[1], DisplayPrimaryX[2]                : 0 / 0 / 0
SVT [config]: DisplayPrimaryY[0], DisplayPrimaryY[1], DisplayPrimaryY[2]                : 0 / 0 / 0
SVT [config]: WhitePointX (0, 0) / DisplayMasteringLuminance Range [0 ~ 0]
SVT [config]: Constrained Intra / HDR / Code VPS SPS PPS / Code EOS                     : 0 / 0 / 1 / 0
SVT [config]: Sending VUI / Temporal ID / VPS Timing Info                               : 0 / 1 / 1
SVT [config]: SEI Message:
SVT [config]: AccessUnitDelimiter / BufferingPeriod / PictureTiming                     : 0 / 0 / 0
SVT [config]: RegisteredUserData / UnregisteredUserData / RecoveryPoint                 : 0 / 0 / 0
-------------------------------------------
==37108== Conditional jump or move depends on uninitialised value(s)
==37108==    at 0x4FACF26: EbVideoUsabilityInfoCtor (EbSei.c:39)
==37108==    by 0x4EEEA93: EbSequenceControlSetCtor (EbSequenceControlSet.c:104)
==37108==    by 0x4EEEBD2: EbSequenceControlSetCreator (EbSequenceControlSet.c:117)
==37108==    by 0x4EF62B2: EBObjectWrapperCtor (EbSystemResourceManager.c:430)
==37108==    by 0x4EF66C3: EbSystemResourceCtor (EbSystemResourceManager.c:497)
==37108==    by 0x4E80AE1: EbInitEncoder (EbEncHandle.c:703)
==37108==    by 0x111997: InitEncoder (EbAppContext.c:649)
==37108==    by 0x112426: main (EbAppMain.c:162)
==37108==
==37108== Conditional jump or move depends on uninitialised value(s)
==37108==    at 0x4FACF39: EbVideoUsabilityInfoCtor (EbSei.c:38)
==37108==    by 0x4EEEA93: EbSequenceControlSetCtor (EbSequenceControlSet.c:104)
==37108==    by 0x4EEEBD2: EbSequenceControlSetCreator (EbSequenceControlSet.c:117)
==37108==    by 0x4EF62B2: EBObjectWrapperCtor (EbSystemResourceManager.c:430)
==37108==    by 0x4EF66C3: EbSystemResourceCtor (EbSystemResourceManager.c:497)
==37108==    by 0x4E80AE1: EbInitEncoder (EbEncHandle.c:703)
==37108==    by 0x111997: InitEncoder (EbAppContext.c:649)
==37108==    by 0x112426: main (EbAppMain.c:162)
==37108==
SVT Memory Usage:
    total allocated memory:       2.09 GB
        malloced memory:          953.89 MB
        callocated memory:        99.50 MB
        allocated aligned memory: 1.06 GB
    mutex count: 964
    semaphore count: 608
    thread count: 144
    hash table fulless: 0.180806, hash bucket is healthy
top 10 malloced memory locations:
(236.93 MB): /home/vcse/jun/jun_fork/SVT-HEVC/Source/Lib/Codec/EbBitstreamUnit.c:30
(138.96 MB): /home/vcse/jun/jun_fork/SVT-HEVC/Source/Lib/Codec/EbPictureControlSet.c:809
(102.77 MB): /home/vcse/jun/jun_fork/SVT-HEVC/Source/Lib/Codec/EbPictureControlSet.c:820
(63.50 MB): /home/vcse/jun/jun_fork/SVT-HEVC/Source/Lib/Codec/EbCodingUnit.c:71
(45.59 MB): /home/vcse/jun/jun_fork/SVT-HEVC/Source/Lib/Codec/EbPictureControlSet.c:800
(41.71 MB): /home/vcse/jun/jun_fork/SVT-HEVC/Source/Lib/Codec/EbMotionEstimationContext.c:117
(41.71 MB): /home/vcse/jun/jun_fork/SVT-HEVC/Source/Lib/Codec/EbMotionEstimationContext.c:119
(41.71 MB): /home/vcse/jun/jun_fork/SVT-HEVC/Source/Lib/Codec/EbMotionEstimationContext.c:118
(41.71 MB): /home/vcse/jun/jun_fork/SVT-HEVC/Source/Lib/Codec/EbMotionEstimationContext.c:116
(25.81 MB): /home/vcse/jun/jun_fork/SVT-HEVC/Source/Lib/Codec/EbPictureControlSet.c:866
Encoding          ==37108== Conditional jump or move depends on uninitialised value(s)
==37108==    at 0x4E93FED: CopyInputBuffer (EbEncHandle.c:3537)
==37108==    by 0x4E94845: EbH265EncSendPicture (EbEncHandle.c:3567)
==37108==    by 0x116AA6: ProcessInputBuffer (EbAppProcessCmd.c:1289)
==37108==    by 0x112620: main (EbAppMain.c:201)
==37108==
==37108== Conditional jump or move depends on uninitialised value(s)
==37108==    at 0x4E93FED: CopyInputBuffer (EbEncHandle.c:3537)
==37108==    by 0x4E94845: EbH265EncSendPicture (EbEncHandle.c:3567)
==37108==    by 0x116B32: ProcessInputBuffer (EbAppProcessCmd.c:1301)
==37108==    by 0x112620: main (EbAppMain.c:201)
==37108==                                                                                                                                                                                                     5
SUMMARY --------------------------------- Channel 1  --------------------------------
Total Frames            Frame Rate              Byte Count              Bitrate
           5            30.00 fps                    11078              531.74 kbps

Channel 1
Average Speed:          0.03 fps
Total Encoding Time:    164599 ms
Total Execution Time:   180609 ms
Average Latency:        148207 ms
Max Latency:            163320 ms
SVT: you have no memory leak
Encoder finished
==37108==
==37108== HEAP SUMMARY:
==37108==     in use at exit: 40 bytes in 1 blocks
==37108==   total heap usage: 758,599 allocs, 758,598 frees, 2,511,351,167 bytes allocated
==37108==
==37108== 40 bytes in 1 blocks are still reachable in loss record 1 of 1
==37108==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==37108==    by 0x4EF6F62: EbCreateMutex (EbThreads.c:239)
==37108==    by 0x4F5BC35: CreateMallocMutex (EbMalloc.c:43)
==37108==    by 0xF93EA98: __pthread_once_slow (pthread_once.c:116)
==37108==    by 0x4F5BC56: GetMallocMutex (EbMalloc.c:50)
==37108==    by 0x4F5BD70: ForEachMemEntry (EbMalloc.c:143)
==37108==    by 0x4F5C616: EbAddMemEntry (EbMalloc.c:175)
==37108==    by 0x4E95D92: InitH265EncoderHandle (EbEncHandle.c:3790)
==37108==    by 0x4E88AB4: EbInitHandle (EbEncHandle.c:1554)
==37108==    by 0x11191D: InitEncoder (EbAppContext.c:622)
==37108==    by 0x112426: main (EbAppMain.c:162)
==37108==
==37108== LEAK SUMMARY:
==37108==    definitely lost: 0 bytes in 0 blocks
==37108==    indirectly lost: 0 bytes in 0 blocks
==37108==      possibly lost: 0 bytes in 0 blocks
==37108==    still reachable: 40 bytes in 1 blocks
==37108==         suppressed: 0 bytes in 0 blocks
tianjunwork commented 4 years ago

Hi @xuguangxin , hope you are doing well. I wonder if you are interested to review these results :)

tianjunwork commented 4 years ago

CI fast test result with 2 failed buffered_test test cases. Log and bitstream seems fine. Not sure why they are marked as fail.

   1 ---------------------------------------------------------
   2 Test Begin...
   3 Total Number of Tests: 2685
   4 Total Passed: 2683
   5 Percentage Passed: 99.9255121043%
   6 Time Elapsed: 7 hour(s), 13 minute(s), 13 second(s)
   7 ---------------------------------------------------------
xuguangxin commented 4 years ago

Hi @xuguangxin , hope you are doing well. I wonder if you are interested to review these results :)

sure I can help. But it may be a little slower since it's a large commit. Hope I can finish it before mid of next week. thanks

tianjunwork commented 4 years ago

Hi @xuguangxin , yeah, this PR is to much to review. I was asking if you could review the test result above, the committed memory and valgrind. Did you see similar trend with your PR to AV1?

tianjunwork commented 4 years ago

Briefly tested below parameters with different combinations, bitstream has no R2R issue compared with master. CBR, -nb, -hierarchical-levels, -pred-struct, -sao VBR R2R issue on master is submitted.

tianjunwork commented 4 years ago

Speed testing with default parameters: First two columns are with this PR, last two are from master tip. Regression for both, but mainly of over all executing time. Will look into this.

input time fps time fps
1920x1080 11.831s 184.28 11.184s 189.41
7680x3840 1m33.284s 27.46 1m25.442s 27.22
7680x4320 10bit 2m47.448s 18.22 2m9.507s 17.08
tianjunwork commented 4 years ago

Fixed jump depends on uninitialized value. New result of valgrind.

valgrind --leak-check=full --show-leak-kinds=all --tool=memcheck ./SvtHevcEncApp -i ../../../../yuv/bbb_1920x1080_420p.y4m -b out.bin -n 5
==27518== Memcheck, a memory error detector
==27518== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==27518== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==27518== Command: ./SvtHevcEncApp -i ../../../../yuv/bbb_1920x1080_420p.y4m -b out.bin -n 5
==27518==
-------------------------------------------
SVT-HEVC Encoder
SVT [version]:  SVT-HEVC Encoder Lib v1.4.3
SVT [build]  :  GCC 8.1.0        64 bit
LIB Build date: Feb 12 2020 16:30:07
-------------------------------------------
Number of logical cores available: 72
Number of PPCS 62
-------------------------------------------
SVT [config]: Main Profile      Tier (auto)     Level (auto)
SVT [config]: EncoderMode / Tune                                                        : 7 / 1
SVT [config]: EncoderBitDepth / CompressedTenBitFormat / EncoderColorFormat             : 8 / 0 / 1
SVT [config]: SourceWidth / SourceHeight / InterlacedVideo                              : 1920 / 1080 / 0
SVT [config]: Fps_Numerator / Fps_Denominator / Gop Size / IntraRefreshType             : 30 / 1 / 32 / -1
SVT [config]: HierarchicalLevels / BaseLayerSwitchMode / PredStructure                  : 3 / 0 / 2
SVT [config]: BRC Mode / QP / LookaheadDistance / SceneChange                           : CQP / 32 / 17 / 1
SVT [config]: BitRateReduction / ImproveSharpness                                       : 0 / 0
SVT [config]: tileColumnCount / tileRowCount / tileSliceMode / Constraint MV            : 1 / 1 / 0 / 0
SVT [config]: De-blocking Filter / SAO Filter                                           : 1 / 1
SVT [config]: HME / UseDefaultHME                                                       : 1 / 1
SVT [config]: MV Search Area Width / Height                                             : 16 / 7
SVT [config]: HRD / VBV MaxRate / BufSize / BufInit                                     : 0 / 0 / 0 / 90
SVT [config]: More configurations for debugging:
SVT [config]: Channel ID / ActiveChannelCount                                           : 0 / 1
SVT [config]: Number of Logical Processors / Target Socket                              : 0 / -1
SVT [config]: Threads To RT / Thread Count / ASM Type                                   : 1 / 0 / 1
SVT [config]: Speed Control / Injector Frame Rate                                       : 0 / 60
SVT [config]: MaxCLL / MaxFALL / Output Reconstructed YUV                               : 0 / 0 / 0
SVT [config]: MasterDisplayColorVolume / DolbyVisionProfile                             : 0 / 0
SVT [config]: DisplayPrimaryX[0], DisplayPrimaryX[1], DisplayPrimaryX[2]                : 0 / 0 / 0
SVT [config]: DisplayPrimaryY[0], DisplayPrimaryY[1], DisplayPrimaryY[2]                : 0 / 0 / 0
SVT [config]: WhitePointX (0, 0) / DisplayMasteringLuminance Range [0 ~ 0]
SVT [config]: Constrained Intra / HDR / Code VPS SPS PPS / Code EOS                     : 0 / 0 / 1 / 0
SVT [config]: Sending VUI / Temporal ID / VPS Timing Info                               : 0 / 1 / 1
SVT [config]: SEI Message:
SVT [config]: AccessUnitDelimiter / BufferingPeriod / PictureTiming                     : 0 / 0 / 0
SVT [config]: RegisteredUserData / UnregisteredUserData / RecoveryPoint                 : 0 / 0 / 0
-------------------------------------------
SVT Memory Usage:
    total allocated memory:       2.09 GB
        malloced memory:          953.89 MB
        callocated memory:        99.50 MB
        allocated aligned memory: 1.06 GB
    mutex count: 964
    semaphore count: 608
    thread count: 144
    hash table fulless: 0.180806, hash bucket is healthy
top 10 malloced memory locations:
(236.93 MB): /home/vcse/jun/jun_fork/SVT-HEVC/Source/Lib/Codec/EbBitstreamUnit.c:30
(138.96 MB): /home/vcse/jun/jun_fork/SVT-HEVC/Source/Lib/Codec/EbPictureControlSet.c:809
(102.77 MB): /home/vcse/jun/jun_fork/SVT-HEVC/Source/Lib/Codec/EbPictureControlSet.c:820
(63.50 MB): /home/vcse/jun/jun_fork/SVT-HEVC/Source/Lib/Codec/EbCodingUnit.c:71
(45.59 MB): /home/vcse/jun/jun_fork/SVT-HEVC/Source/Lib/Codec/EbPictureControlSet.c:800
(41.71 MB): /home/vcse/jun/jun_fork/SVT-HEVC/Source/Lib/Codec/EbMotionEstimationContext.c:117
(41.71 MB): /home/vcse/jun/jun_fork/SVT-HEVC/Source/Lib/Codec/EbMotionEstimationContext.c:119
(41.71 MB): /home/vcse/jun/jun_fork/SVT-HEVC/Source/Lib/Codec/EbMotionEstimationContext.c:118
(41.71 MB): /home/vcse/jun/jun_fork/SVT-HEVC/Source/Lib/Codec/EbMotionEstimationContext.c:116
(25.81 MB): /home/vcse/jun/jun_fork/SVT-HEVC/Source/Lib/Codec/EbPictureControlSet.c:866
Encoding         5
SUMMARY --------------------------------- Channel 1  --------------------------------
Total Frames            Frame Rate              Byte Count              Bitrate
           5            30.00 fps                    11078              531.74 kbps

Channel 1
Average Speed:          0.03 fps
Total Encoding Time:    154184 ms
Total Execution Time:   167515 ms
Average Latency:        138396 ms
Max Latency:            152508 ms
SVT: you have no memory leak
Encoder finished
==27518==
==27518== HEAP SUMMARY:
==27518==     in use at exit: 40 bytes in 1 blocks
==27518==   total heap usage: 758,599 allocs, 758,598 frees, 2,511,351,167 bytes allocated
==27518==
==27518== 40 bytes in 1 blocks are still reachable in loss record 1 of 1
==27518==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==27518==    by 0x4EF6F62: EbCreateMutex (EbThreads.c:239)
==27518==    by 0x4F5BC35: CreateMallocMutex (EbMalloc.c:43)
==27518==    by 0xF93EA98: __pthread_once_slow (pthread_once.c:116)
==27518==    by 0x4F5BC56: GetMallocMutex (EbMalloc.c:50)
==27518==    by 0x4F5BD70: ForEachMemEntry (EbMalloc.c:143)
==27518==    by 0x4F5C616: EbAddMemEntry (EbMalloc.c:175)
==27518==    by 0x4E95D92: InitH265EncoderHandle (EbEncHandle.c:3790)
==27518==    by 0x4E88AB4: EbInitHandle (EbEncHandle.c:1554)
==27518==    by 0x111932: InitEncoder (EbAppContext.c:625)
==27518==    by 0x11243B: main (EbAppMain.c:162)
==27518==
==27518== LEAK SUMMARY:
==27518==    definitely lost: 0 bytes in 0 blocks
==27518==    indirectly lost: 0 bytes in 0 blocks
==27518==      possibly lost: 0 bytes in 0 blocks
==27518==    still reachable: 40 bytes in 1 blocks
==27518==         suppressed: 0 bytes in 0 blocks
==27518==
==27518== For counts of detected and suppressed errors, rerun with: -v
==27518== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
xuguangxin commented 4 years ago

@tianjunwork , is it possible to split the style change from the functional change? like this file Source/Lib/Codec/EbCabacContextModel.c, only small functional change in it. Mix them will introduce difficult for review and cherry-pick the patch. thanks

xuguangxin commented 4 years ago

@tianjunwork , the valgrind result is ok, it's a static object. for image size, why it's increased? the EbMalloc things will turn off for release build, it should no side effect for memory foot print.

tianjunwork commented 4 years ago

Thank you very much @xuguangxin reviewing the PR. I will address the comments. I was using debug build with vmmap. Will try with release build again. BTW, did you measure the performance of total executing time and fps with/-out your PR to AV1?

xuguangxin commented 4 years ago

Thank you very much @xuguangxin reviewing the PR. I will address the comments. I was using debug build with vmmap. Will try with release build again. BTW, did you measure the performance of total executing time and fps with/-out your PR to AV1?

Hi @tianjunwork , I only do one test https://github.com/OpenVisualCloud/SVT-AV1/pull/314#issuecomment-508621639 for release mode. It will greatly reduce the start time. It will not introduce overhead to fps, since we only allocate all things at the start and free it at the end.

tianjunwork commented 4 years ago

Redo the vmmap test with release build, command line used: -i C:\Users\tianjun\Desktop\media_tools\test_video\720x480_420_10bit_10.yuv -w 720 -h 480 -n 2000 -b out.bin -bit-depth 10

Total pick committed size dropped from ~3.73G to ~2G. Image size stays 11.5M, no regression.

Committed size with this PR: new

Committed size without this PR: old

tianjunwork commented 4 years ago

Thank you very much @xuguangxin reviewing the PR. I will address the comments. I was using debug build with vmmap. Will try with release build again. BTW, did you measure the performance of total executing time and fps with/-out your PR to AV1?

Hi @tianjunwork , I only do one test OpenVisualCloud/SVT-AV1#314 (comment) for release mode. It will greatly reduce the start time. It will not introduce overhead to fps, since we only allocate all things at the start and free it at the end.

I see. Thank you. I will do some profiling tomorrow.

tianjunwork commented 4 years ago

From below vtune data, EbInitEncoder(EbPictureBufferDescCtor->memset) causes the perf regression. CPU time of other functions keep about the same. With this PR: vtune_new_bottomup

HEAD: vtune_old_bottomup

tianjunwork commented 4 years ago

With the fix to the regression, init time is back to and better than original. Thx @Austin-Hu and @lijing0010.

vtune_fix_topdown

HEAD: vtune_old_topdown

xuguangxin commented 4 years ago

With the fix to the regression, init time is back to and better than original. Thx @Austin-Hu and @lijing0010.

vtune_fix_topdown

HEAD: vtune_old_topdown

@tianjunwork we use EB_CALLOC_ARRAY in SVT AV1 because it will memset the memory after malloc https://github.com/OpenVisualCloud/SVT-AV1/pull/314/files#diff-40daf4c2999b357c7617eeb0d545b007L99 Could we apply your patch to SVT AV1 too?

thanks

xuguangxin commented 4 years ago

Yes, correct.

From: Austin Hu notifications@github.com Sent: Wednesday, February 26, 2020 11:17 AM To: OpenVisualCloud/SVT-HEVC SVT-HEVC@noreply.github.com Cc: Xu, Guangxin guangxin.xu@intel.com; Mention mention@noreply.github.com Subject: Re: [OpenVisualCloud/SVT-HEVC] Resource de/allocation refactor (#467)

@Austin-Hu commented on this pull request.


In Source/Lib/Codec/EbMalloc.hhttps://github.com/OpenVisualCloud/SVT-HEVC/pull/467#discussion_r384254113:

  • do {\

+#define EB_CALLOC_ARRAY(pa, count) \

+#define EB_FREE_ARRAY(pa) \

+#define EB_ALLOC_PTR_ARRAY(pa, count) \

EB_ALLOC_PTR_ARRAY and EB_CALLOC_ARRAY are exactly the same, except for their naming. Maybe EB_ALLOC_PTR_ARRAY is duplicately defined here to be paired with EB_FREE_PTR_ARRAY, correct?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/OpenVisualCloud/SVT-HEVC/pull/467?email_source=notifications&email_token=ABS32GXYKXMJQOS4NES2KBTREXNMDA5CNFSM4KRSMDMKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCW5VL2I#discussion_r384254113, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABS32GTTUXULSWE36S4QSPLREXNMDANCNFSM4KRSMDMA.

tianjunwork commented 4 years ago

@xuguangxin I see, thx for the explanation. SVT-HEVC has no memset after EB_ALLIGN_MALLOC on HEAD. I will make a patch to AV1. But need someone who added memset to review in case it is to fix some issue that I don't know?

tianjunwork commented 4 years ago
Speed testing with default parameters. -n 1000 x 5 times. First two columns are with this PR, last two are from master tip. No perf regression. Total execution time is 4.4% less then master. input time fps time fps
7680x4320 10bit 1m10.605s 16.2 1m13.865s 16.26
tianjunwork commented 4 years ago

Hi @xuguangxin , on your comments, mem leak is not handled in the calling function itself. But some source code error is fixed, so that any throw will cause EbInitEncoder return EB_ErrorInsufficientResources to the application. Application should terminate encoder in this case. This makes it easier to handle many possible mem leaks because of throw. Tested with manually adding return EB_ErrorInsufficientResources to simulate throw. EbInitEncoder can return EB_ErrorInsufficientResources now with my fix.

tianjunwork commented 4 years ago

Hi, @Austin-Hu , I didn't change malloc with EB_MALLOC, using malloc gives the opportunity to free previously allocated buffer in this function. EB_MALLOC will return inside the macro, making it hard to handle mem leak.

tianjunwork commented 4 years ago

Hi All, all the comments are address, please help to review the last round. Thank you again for the help:)

xuguangxin commented 4 years ago

thans Jun, for addressed all my comments.

tianjunwork commented 4 years ago

thans Jun, for addressed all my comments.

Thank you again for helping on this PR.