Closed lucshi closed 2 years ago
Ordinarily, you would not benefit from the memory cache. That means that depending on your L3 cache size 10MB or 100MB would be the buffer size you need for the benchmark.
Thanks for the pull request. I'll try to comment on it in the next few days. I like the idea of adding an AVX512 codec if @WojciechMula's code is under a compatible license. But there are a few things that I don't like in this pull request, such as copypasting code from Chromium and not updating the user documentation. I'll try to respond more substantially in the coming days.
Hi htot, thank you for your prompt comments. I did not get your point exactly. Could you please kindly give more hints? Thanks!
Hi htot, thank you for your prompt comments. I did not get your point exactly. Could you please kindly give more hints? Thanks!
The benchmark repeatedly encodes the same array of data. If the array fits in your (L3) cache you measure very high speeds that you would normally not see when encoding a single image once. I think most of our benchmarks were done with 10MB, however I now have a system with 16MB cache.
Hi htot, thank you for your prompt comments. I did not get your point exactly. Could you please kindly give more hints? Thanks!
The benchmark repeatedly encodes the same array of data. If the array fits in your (L3) cache you measure very high speeds that you would normally not see when encoding a single image once. I think most of our benchmarks were done with 10MB, however I now have a system with 16MB cache.
L3 cache of each of CPU socket is 48MB (2 sockets are 96MB in total). Are you suggesting I should enlarge the benchmark data size bigger than 48MB/96MB, or just measure the first round performance to avoid cache benefit? I did not get your point why it would be a problem when benchmark data size is smaller than cache size. I thought L3 cache cannot make two algorithem with the same performance if the algorithms have performance gaps in nature.
That's a lot of cache :-) I was lazy and just added a benchmark with 100MB. I'm note sure how the split L3 cache works. But in principle I would say that caching code but not the data is fair. Caching data can be useful to compare the speed of the algorithms excluding time to access memory.
Hi @aklomp , I updated my patch and accepted your review comments of "adding readme and removing chromium code". Could you please have a review? Thanks! After it is landed it will be merged back into Node.js repo.
Hi @aklomp , I'm wondering if you have bandwidth to review this PR. Thanks!
@lucshi Sorry for the delay, I'm still here! Your pull request is near the top of my mental to-do list of things to process (along with #105), but I've been really busy the past weeks and haven't found a moment to deeply engage with it. I do plan on giving it a closer look soon. If you feel that I'm taking too long, don't hesitate to reach out.
@lucshi Sorry for the delay, I'm still here! Your pull request is near the top of my mental to-do list of things to process (along with #105), but I've been really busy the past weeks and haven't found a moment to deeply engage with it. I do plan on giving it a closer look soon. If you feel that I'm taking too long, don't hesitate to reach out.
Thank you @aklomp for the response. I would like to make it landed if possible very soon to catch up with the Node.js upstream optimization. Do you require me to write some notes about the PR to make the review faster?
@lucshi The main question I had was whether WojciechMula's code was under a compatible licence, but apparently it is (3-clause BSD).
Before merging, I also want to do a thorough code review to ensure that the code conforms to the library's general structure. It's this code review that I'm blocked on, but I'll try to put it on the fast path. I do feel bad for making you wait this long for a proper PR review. I really do appreciate the effort that you put in this, and it's definitely a very nice enhancement.
@lucshi The main question I had was whether WojciechMula's code was under a compatible licence, but apparently it is (3-clause BSD).
Before merging, I also want to do a thorough code review to ensure that the code conforms to the library's general structure. It's this code review that I'm blocked on, but I'll try to put it on the fast path. I do feel bad for making you wait this long for a proper PR review. I really do appreciate the effort that you put in this, and it's definitely a very nice enhancement.
I'm totally OK for a careful code review. Per my understanding, BSD-3 is a most popular license nowadays and the code can be used freely in BSD-2 project as long as including the copyright text and the authors and projects names not be used in market promotion.
I'll also start the IP scan for the code base along with my PR by Protex IP Scanning tool to see if any other IP issues. Will update the results soon.
There may be other IP related revise, but the IP scanning tool Protex did not found IP conflictions.
Hey, I've informally reviewed the updated PR and I think it looks good on the whole. It's something I think can be merged, but it needs some work to clean it up. I'm volunteering to do that; more on that below. The main sticking points regarding the code I have are:
#include "../avx2/dec_loop.c"
in lib/arch/avx512/codec.c
.LICENSE
file.I'm giving you this short list instead of a full review because I don't want to come across as mean by pointing out boring details like trailing whitespace in overlong detail. I'm saving us the effort. But if you want though, I can certainly do a full review.
For the way forward, what I had in mind was that I would take your PR and rework it to fix the issues above, retaining you as the author, and then come back to you for permission to merge that new branch. I think it's faster and less frustrating (and less work for you!) to do it that way. Of course, if you prefer, we can also do it classically and I'll be the reviewer/oracle while you push corrections. What do you say?
Then there are two ephemeral issues that should also be mentioned:
(Sorry for accidentally closing this issue, I clicked the wrong button!)
I pushed a branch called avx512
which squashes your work on top of master
and makes some cleanups/fixes. I didn't break this up into smaller atomic commits yet, but it gives you an idea of the direction I'm moving in.
I also set up Intel SDE locally to test the AVX512
code.. very good for building confidence in the code! :sweat_smile:
Hi @aklomp , Thank you for the quick review comments. I'm OK to go with the fastest way that you do the fixes and made me one of the authors.
When everything is done, I will submit a PR on node.js and merge updated code.
For the license issue. I have consulted the license expert in my company for the process of adding a BSD3 source file avx512.c into a BSD2 project.
The answer I got as below:
My understanding is that the bottom line is that you make sure the bsd-3 license text is in the avx512.c file.
Again sorry for the delay. I intend to refactor the avx512
branch soon and merge it, so that we can both move on :)
Thanks for checking about the license. I agree with the proposed solution of putting the BSD-3 license text in the file with the imported code; that should indeed satisfy the license requirements.
I also created an issue, #106, to discuss moving this project from BSD-2 to BSD-3. It's a very small change that I think is for the better, and it will allow this project to import code from more locations. The one remaining question is if it would impact downstream users of this library, but I don't really think so because this library is generally included as a source bundle complete with license file.
Regarding the license issue, I noticed that @WojciechMula has also published his AVX512 encoder in his base64simd project, specificially here. That library is under the BSD-2 license. That simplifies things, all we need to do is update the copyright year in Muła's mention in the LICENSE
.
Regarding the license issue, I noticed that @WojciechMula has also published his AVX512 encoder in his base64simd project, specificially here. That library is under the BSD-2 license. That simplifies things, all we need to do is update the copyright year in Muła's mention in the
LICENSE
.
Great!
Alright, I pushed an avx512
branch that contains four commits, crediting you as the author, that implement the AVX512 encoder. The branch contains a number of relatively minor fixes I made to your PR; you can review them with a git diff
.
Since your name is on these commits, I have to ask you: are you OK if I merge this?
I'm OK to merge. Thank you!
Add AVX512 support for base64 encoding based on WojciechMula's code.
aklomp code base has been borrowed by Node.js upstream and I would like to enable avx512 support which can improve base64 encoding performance by additional +12% for node.js base64 on. The measurement of aklomp benchmark result also showed performance improvement for <100KB data, compared with AVX2. Improvements are around 36%-86%.
Above results were collected on Icelake cpu which is widely used in Amazon AWS EC2 cloud instances.