Alex313031 / Thorium-Win

Chromium fork for Windows named after radioactive element No. 90; Windows builds of https://github.com/Alex313031/Thorium
https://thorium.rocks/
BSD 3-Clause "New" or "Revised" License
1.39k stars 36 forks source link

Any chance to get an avx2 build too ? #2

Closed Neucher closed 2 years ago

Neucher commented 2 years ago

Did some tests with some number crunching web applications and it did actually use avx2. The speed was honestly out of this world. 48h+ (sse2) vs 5 minutes (avx2).

Alex313031 commented 2 years ago

@Lehner82 Are you saying you tried this on one of @RobRich999 AVX2 builds, or Thorium? And possibly, me and rob need to convene and get some patched files for certain components that wont compile with higher than AVX. Also, did you see my response about getting a non deb linux build, there are portable zips now with .desktop files in them and instructions inside in a readme for using portably or installing to a linux distro that doesn't use deb packaging.

Alex313031 commented 2 years ago

@Lehner82 Please answer the above so I can understand what your talking about. But otherwise/anyway, just made my first AVX2 windows release > https://github.com/Alex313031/Thorium-Win/releases/tag/V100.0.4865.0

Alex313031 commented 2 years ago

@Lehner82 Hows the AVX2 release working for you?

Neucher commented 2 years ago

Sorry for the late response, the AVX2 build works pretty much perfect. I saw RobRich999 made his repo read-only. I guess everyone who used it has to switch to your builds now. šŸ˜

Neucher commented 2 years ago

Also any chance to use -falign-functions=32 for the next windows AVX2 build. GentooLTO recommends it globally for Intel CPUs with no regressions on AMD.

Alex313031 commented 2 years ago

@Lehner82 I know its so sad. I'll miss him and his builds. He was having health issues and it was alot of work for him to make regular, avx, and avx2 builds for linux, and 32bit, regular, avx, and avx2 builds for windows. I'll probably end up getting alot more users, which I guess is the only positive thing to come out of this.

I'll add that to Thorium, no problem mate. I will be doing tests on intel and amd to verify before I make it public.

Neucher commented 2 years ago

That's great. Are there still any optimizations from RobRich999's build you have yet to add to Thorium ? I could help you test any builds you might have especially windows builds since that's what I have to use for work (sadly).

Also any chance the android builds will be making a return ? Nobody actually makes any using the latest dev build (excluding the latest arm32 chromium snapshots).

Alex313031 commented 2 years ago

@Lehner82 My builds have all of the optimizations of Rob's except Polly, but also has more that his didn't have. And yeah I'll let you know if I could use a win tester. And ehhh, almost all of the patches are not applicable to android, I couldnt figure out how to get the logo and branding right on android, and my builds were buggy. And it adds more work on top of the fact it seems like im constantly building thorium or chromium os either for myself or the public. I might revisit it. Also, most of the optimizations are for x64 only. The small performance improvement and the fact that its a dev build are really all you get out of Thorium on android for now.

dabugen commented 2 years ago

@Alex313031 Thanks so much for continuing where Rob left off. IĀ“ve always been using his AVX2 builds that he kindly provided to the community for all these years and was very sad to see him go (although I fully understand). ItĀ“s great to see that someone continues these builds and so IĀ“ve tried your AVX2 build 4865 and wow, this is even faster than RobRichĀ“s 4863 AVX2 build! I run a trading system development Javascript-based app (data mining) which maxes out the CPU 24/7 as I am running it all the time, and I can already now see, running RobĀ“s AVX2 builds side by side to yours with the same workload, that yours is about 7% faster, which is A LOT, wow. So thank you for that and I am eager to know what else youĀ“ve been able to optimize to get such a massive speed-boost? Did you possibly run more Javascript based components while doing the PGO for profiling? In any case, I hope you can continue to provide these kind of insanely fast builds and thanks for all the efforts, itĀ“s so much appreciated.

Untitled

After 15 minutes runtime. Will most likely get a even greater margin the longer it runs. Nice work!

Alex313031 commented 2 years ago

@dabugen That is one of the pivotal points of Thorium. Since Rob stopped producing builds, my builds are the fastest Chromium browser you can get for Linux and Windows, bar none. I'm proud to say that and it took alot of work. Ever since I got on chromium.woolyss.com my userbase has skyrocketed, and I'm so glad that many people find my work useful. It makes me even more motivated to improve and keep it up to date. I am making the next windows release now, and the next one after that will be an AVX2 build. And I don't do my own profiling, I use the PGO profile downloaded for Chromium from google storage. 7% is inline with my tests. During testing on windows and linux, with the AVX builds, comparing to vanilla chromium and google chrome dev, I get a 2-10% improvement depending on the test. The AVX2 builds will be more, but I don't usually test those.

Alex313031 commented 2 years ago

@dabugen @Lehner82 I'm probably going to make a seperate repo for AVX2 builds for windows and post a notice in the readme and the next couple of windows and linux builds in the release notes.

Alex313031 commented 2 years ago

@dabugen @Lehner82 @raingart @ritmation Since @RobRich999 sadly stopped making pubic Chromium builds, and I know AVX2 builds are in demand, especially for windows, I have made a new repo just for AVX2 builds. It will be mostly Windows builds, but I will occasionally make and post an AVX2 linux build too. Read the readme for more details. https://github.com/Alex313031/Thorium-AVX2

dabugen commented 2 years ago

ThatĀ“s great Alex, many thanks. Are there any new optimizations in that new AVX2 build? Always curious to see what can be squeezed out... ;-) Thanks again, really appreciated.

Alex313031 commented 2 years ago

@dabugen Nothing new, but next release I'm gonna experiment with setting the Rust CFLAGS to AVX. Right now it just has -O3 for optimizing, but I wanna compile the Rust sub components with AVX.

dabugen commented 2 years ago

Thanks for the continued efforts, that sounds like a good plan. Have you ever thought about running your own PGO? I remember that RobRich did that and got some further gains.

Alex313031 commented 2 years ago

@dabugen Good idea, Ill look into it.

Alex313031 commented 2 years ago

@dabugen @Lehner82 Doing a PGO profile yourself takes alot of time. I would have to do a debug build, i.e. with the same code and args.gn flags EXCEPT, with is_official_build set to false and is_debug set to true, then have to run it in debug mode with the profiler running for about an hour. Then use that .profdata data when making another build with those flags set to true and false (like normal Thorium). So I end up having to make two builds and spend an hour profiling. I'm going to do it, but if I only get a marginal increase in performance, I probably wont do it all the time. Like it might be every now and again, Ill make a special release with something in the tag like "natively PGO profiled" or something. Basically what I'm saying is I'm going to do it because I'm curious about the performance benefits, but it probably wont be something I do every release, because I'm already bogged down with building for linux, macos, windows, and now avx2 for windows.

dabugen commented 2 years ago

I am absolutely eager what you can come up with and if your own PGO would really improve performance. I think it will also heavily depend on what exactly is being profiled. If IĀ“d, for example, profile my trading strategy generator app at https://eas.forexsb.com/, it would most likely be able to speed this up further. But of course, that would be a version thatĀ“s only useful for me.

Alex313031 commented 2 years ago

@dabugen Yeah. I would not do any specific profiling. My builds are for a large audience. For example, the AVX2 builds have -march=haswell set to build for all instructions up to and including AVX2, but I have -mtune=x64-generic. When -march is set by itself, it implies -mtune to be set to the same thing. If I just set -march, then the compiler would tune it only for haswell cpus. Since the avx2 releases will run on a variety of CPUs, I set it to basically say "tune it for nothing" so that performance isn't unfairly given to only haswell, while still having higher performance overall compared to an AVX or vanilla sse3 build. Following this idea of not tailoring Thorium to any one type of use case or CPU, the profiling I'm gonna do is just by following the official profiling instructions in the chromium source code, run on my debian 11 build machine with no other applications running in the background.

Alex313031 commented 2 years ago

@dabugen @Lehner82 I am going to close this issue, but I encourage any of you to make a post in the discussions about this.

GrilledPear commented 2 years ago

@Alex313031 I saw in a previous comment you mentioned that you test AVX builds comparing to google chrome dev (and vanilla chromium), then from another comment here, Artoriuz did tests on chrome canary. So my question is: what's the status of AVX on google's builds, do they all (stable, beta, dev, canary) have AVX and/or AVX2 enabled?

Alex313031 commented 2 years ago

@GrilledPear No. Chrome builds only use SSE3, for compatibility. Same with chromium. Thorium has the main build.gn and ones for windows and arm tweaked with various things. AVX, AES, and in the case of avx2 versions, also AVX2. It also has cflags, ldflags, and rust flags set to optimization level 3 -O3, across the board, debugging symbols stripped, and some LOOP optimizations for the compiler added that @RobRich999 gave me a while ago.

cflags are for the compiler, ldflags are for the linker, and rust flags are for the rust compiler. You can have optimization level 0 through 3. Chromium by default uses 1 for some targets and 2 for others. I set it to 3 for everything. This is not as good for size (thorium is about 1.8 times the size of a vanilla chromium build), but it is still under 500mb. Im not worried about size, im worried about performance, that is one of the main things of thorium

GrilledPear commented 2 years ago

@Alex313031 Thank you for the very detailed explanation. This surely resolves my confusion on why the testing cases include both chromium and Google's branches.