Open c3m3gyanesh opened 1 year ago
Sorry for the late reply. The bitstream parser's features are output all at once, but in principle that output could be reduced by adapting the Python parser code to only output selected keys from the stats dictionary here:
This should reduce the data stored during runtime in the collected stats list.
However, the whole C/C++ part of the parser is not that efficient and may require large memory for larger files. I haven't checked this in detail. It could be rebuilt by commenting out unneeded features, but that will be more challenging, as the features are set in various places and sometimes have dependencies on each other.
So maybe try the Python part first.
Please keep in mind that this parser and the model is meant for short sequences of <10 seconds length. If you have longer input you may split it into segments and analyze those separately.
In general there is some work being done at the moment to provide a more efficient, more generalizable parser next year.
Running the bitstream videoparser has a large memory requirement. If someone only wants to use the videoparser with the [bitstream_mode3_p1204_3], then only a subset of features will be sufficient. So if we limit bitstream video parser to extract only a subset of features required by bitstream p1204 project, does it reduce the memory requirements? If yes, can it be taken as a enhancement work? If it is not something planned for this project, can you provide pseudocode or other pointers to understand how it can be done?