Closed rainerborene closed 4 months ago
Thank you for the pull request! I have added some comments and will benchmark the change as a whole later on.
Could you please sign the CLA - see https://hexapdf.gettalong.org/contributing.html - so that I can incorporate the changes into HexaPDF?
I've signed the CLA (see your inbox), and pushed the suggested changes. 🚀
Edit: I renamed the instance variable @io_partial
to @io_chunk
which fits better on this context.
Running my usual real world benchmarks I didn't find any specific memory savings or performance improvements due to this change. However, there is a certainly a difference for larger PDF files when running the simple memory benchmark:
ruby -Ilib:. -r prof_memory -r hexapdf -e "HexaPDF::Document.open(path_to_file') {|doc| doc.each(only_current: false) {|o| } }"
(prof_memory
is just a simple wrapper for using the memory_profiler
gem).
Before:
Total allocated: 1.74 GB (14943439 objects)
Total retained: 456.19 kB (4842 objects)
After:
Total allocated: 1.24 GB (14884174 objects)
Total retained: 456.27 kB (4844 objects)
There is a difference of ~60.000 objects. We would expect these objects to be the ones saved at https://github.com/gettalong/hexapdf/pull/319/files#diff-e750dfc750e9c877f39d4174d2b388352eb139a5b1e9f6911edba0ff25afa659R443. And since those objects contain 8.192 we roughly get the difference in memory usage of ~500MB.
You might want to try the same snippet of code in the PR description for the benchmark. It might change the results. I did a test on my computer with a 15MB PDF and there's a minor improvement.
@rainerborene I have merged your changes and pushed to the devel branch - see https://github.com/gettalong/hexapdf/commit/3aeec254b1b4329d033a8318e50d6db5709c7b33
I was able to reduce the memory usage by 5% on
Tokenizer#prepare_string_scanner
method and reduce someString
object allocations as well. Here is the script I used to benchmark this change withmemory_profiler
gem:Before:
After: