Alcaro / Flips

Floating IPS is a patcher for IPS and BPS files.
Other
307 stars 45 forks source link

Requirements? #7

Closed rmenessec closed 6 years ago

rmenessec commented 6 years ago

Could you list the requirements for building Flips? I tested compilation on an Ubuntu Linux 16.04 LTS system with all the basic development requirements for GTK+ 2.0 and 3.0 installed, including the umbrella package libgtk-3-dev. It failed.

$ ./make.sh
rm: cannot remove 'flips.exe': No such file or directory
rm: cannot remove 'floating.zip': No such file or directory
rm: cannot remove 'flips': No such file or directory
rm: cannot remove 'obj/*': No such file or directory
GTK+ (1/3)
rm: cannot remove 'flips': No such file or directory
g++ flips-cli.cpp flips-w32.cpp libbps-suf.cpp libips.cpp crc32.cpp flips.cpp libbps.cpp flips-gtk.cpp libups.cpp -std=c++98 -Wall -Werror -O3 -fomit-frame-pointer -fmerge-all-constants -fvisibility=hidden -fno-exceptions -fno-unwind-tables -fno-asynchronous-unwind-tables -ffunction-sections -fdata-sections -Wl,--gc-sections -fprofile-dir=obj/ -fprofile-generate -lgcov -fno-rtti -fno-exceptions -DNDEBUG -fopenmp -DFLIPS_GTK    -Ilibdivsufsort-2.0.1/include -DHAVE_CONFIG_H -D__STDC_FORMAT_MACROS libdivsufsort-2.0.1/lib/divsufsort.c libdivsufsort-2.0.1/lib/sssort.c libdivsufsort-2.0.1/lib/trsort.c   -oflips
flips-gtk.cpp:20:21: fatal error: gtk/gtk.h: No such file or directory
compilation terminated.
Makefile:93: recipe for target 'flips' failed
make: *** [flips] Error 1

If the GTK+ 3.0 development packages are removed, the command line-only build succeeds, and the resulting binary seems to work correctly.

Alcaro commented 6 years ago

g++, libgtk-3-dev, build-essential and pkg-config should be all that's needed.

That specific output is quite bizarre; I can create this

$ ./make.sh
rm: cannot remove 'flips.exe': No such file or directory
rm: cannot remove 'floating.zip': No such file or directory
GTK+ (1/3)
rm: cannot remove 'flips': No such file or directory
Makefile:50: pkg-config can't find gtk+-3.0, or pkg-config itself can't be found
Makefile:51: if you have the needed files installed, specify their locations and names with `make GTKFLAGS='-I/usr/include' GTKLIBS='-L/usr/lib -lgtk''
Makefile:52: if not, the package name under Debian and derivates is `libgtk-3-dev'; for other distros, consult a search engine
Makefile:53: switching to CLI build
g++ flips-cli.cpp flips-w32.cpp libbps-suf.cpp libips.cpp crc32.cpp flips.cpp libbps.cpp flips-gtk.cpp libups.cpp -std=c++98 -Wall -Werror -O3 -fomit-frame-pointer -fmerge-all-constants -fvisibility=hidden -fno-exceptions -fno-unwind-tables -fno-asynchronous-unwind-tables -ffunction-sections -fdata-sections -Wl,--gc-sections -fprofile-dir=obj/ -fprofile-generate -lgcov -fno-rtti -fno-exceptions -DNDEBUG -fopenmp -DFLIPS_GTK   -Ilibdivsufsort-2.0.1/include -DHAVE_CONFIG_H -D__STDC_FORMAT_MACROS libdivsufsort-2.0.1/lib/divsufsort.c libdivsufsort-2.0.1/lib/sssort.c libdivsufsort-2.0.1/lib/trsort.c   -oflips
flips-gtk.cpp:20:10: fatal error: gtk/gtk.h: No such file or directory
 #include <gtk/gtk.h>
          ^~~~~~~~~~~
compilation terminated.
Makefile:93: recipe for target 'flips' failed
make: *** [flips] Error 1

(by typoing the pkg-config invocation in the Makefile), but missing GTK flags without the Makefile warnings doesn't make any sense.

Are you using non-GNU make? Does it act differently if you use 'make' rather than './make.sh'?

rmenessec commented 6 years ago

tl;dr: it turns out that libpng-dev, which is essential to the build process, was not installed.

$ pkg-config --cflags --libs gtk+-3.0
Package libpng16 was not found in the pkg-config search path.
Perhaps you should add the directory containing `libpng16.pc'
to the PKG_CONFIG_PATH environment variable
Package 'libpng16', required by 'gdk-pixbuf-2.0', not found
Package 'libpng16', required by 'gdk-pixbuf-2.0', not found

In Xenial, it's a virtual package provided by libpng12-dev. I'm using a PPA for the VLC media player, which, if you look at the package details for Xenial, breaks the existing package name scheme by making libpng-dev a real package, version 1.6.34-1, which conflicts with both libpng12-dev and libpng16-dev as provided by Xenial.

After I sorted out the package dependency problems and got libpng-dev installed, pkg-config provided useful output:

$ pkg-config --cflags --libs gtk+-3.0
-I/usr/include/gtk-3.0 -I/usr/include/pango-1.0 -I/usr/include/glib-2.0 -I/usr/lib/x86_64-linux-gnu/glib-2.0/include -I/usr/include/cairo -I/usr/include/pixman-1 -I/usr/include/freetype2 -I/usr/include/libpng16 -I/usr/include/harfbuzz -I/usr/include/gdk-pixbuf-2.0 -I/usr/include/gio-unix-2.0/ -I/usr/include/mirclient -I/usr/include/mircore -I/usr/include/mircookie -I/usr/include/atk-1.0 -I/usr/include/at-spi2-atk/2.0 -I/usr/include/at-spi-2.0 -I/usr/include/dbus-1.0 -I/usr/lib/x86_64-linux-gnu/dbus-1.0/include  -pthread -lgtk-3 -lgdk-3 -lpangocairo-1.0 -lpango-1.0 -latk-1.0 -lcairo-gobject -lcairo -lgdk_pixbuf-2.0 -lgio-2.0 -lgobject-2.0 -lglib-2.0

You may want to add some error handling in the makefile for issues like empty or invalid pkg-config output.

Would you mind adding a standard clean target to the makefile, and possibly stripping the binary by default? It would also be great if you could either turn off profiling if the named ROMs don't exist locally or—ideally—provide some legally redistributable files and patches to perform the profiling with. ;)

Incorporating existing CFLAGS / CXXFLAGS / CPPFLAGS / LDFLAGs environment variables would be really helpful. I like to be able to pass -march=native to gcc/g++, plus some flags to ld(1).

rmenessec commented 6 years ago

Note: GCC link-time optimization appears to work against the code (git master) with no modifications, at time of writing.

Alcaro commented 6 years ago

tl;dr: it turns out that libpng-dev, which is essential to the build process, was not installed. [...] [PPA stuff]

Yeah, sounds more caused by that VLC PPA than anything on my end. Better report to them.

You may want to add some error handling in the makefile for issues like empty or invalid pkg-config output.

There is, and it works for me. Not sure why it didn't trigger for you, maybe pkg-config emitted a space to stdout or something? (And why did I make it send stderr to /dev/null? Error messages are useful.)

Would you mind adding a standard clean target to the makefile

There isn't much to clean, but sure, makes sense anyways. Even if a successful make clean does nothing, errors from there are super unhelpful.

and possibly stripping the binary by default

...wtf, how did I miss that?

It would also be great if you could either turn off profiling if the named ROMs don't exist locally or—ideally—provide some legally redistributable files and patches to perform the profiling with. ;)

Yes, but which files are similar enough to not confuse the patcher (IPS should be trained on a file where data doesn't move around), large and interesting enough to train it properly, and not ROM hacks? Nearly all data I can find multiple versions of loves to move around stuff internally.

But yes, disabling profiling if the ROMs are absent is also an improvement.

Incorporating existing CFLAGS / CXXFLAGS / CPPFLAGS / LDFLAGs environment variables would be really helpful. I like to be able to pass -march=native to gcc/g++, plus some flags to ld(1).

Makefile uses CFLAGS and LDFLAGS already, and adding CXXFLAGS would indeed make sense. Not sure if CPPFLAGS is useful for anything, but sure, no reason not to. make.sh sets its own CFLAGS/LFLAGS, but you can call the Makefile yourself if you prefer that.

GCC link-time optimization appears to work

Good idea. With my lack of .o files, that one has no drawbacks whatsoever.

(Except it triggers this one here... but that one was fixed long ago, and it's my own fault for sticking with Ubuntu 14.04.)

rmenessec commented 6 years ago

Yeah, sounds more caused by that VLC PPA than anything on my end. Better report to them.

Oh, it's 100% an issue caused by using the PPA. I don't see a reason to report it, though. Now that Flips git master source is compiling, I have what I needed. Besides, the problem is likely to be resolved when 17.04 LTS is released this month and I upgrade to it. 😉

Yes, but which files are similar enough to not confuse the patcher (IPS should be trained on a file where data doesn't move around), large and interesting enough to train it properly, and not ROM hacks? Nearly all data I can find multiple versions of loves to move around stuff internally.

Sorry, I'm not sure what to suggest. However, technically, you could use any type of source data, correct? I ended up using Flips because I was applying large DPS patches to the Windows build of AM2R; some sort of game data archive—whatever GameMaker Studio uses—plus the lone EXE file.

Unless I'm badly misunderstanding IPS / DPS, I'm fairly certain that the input data to be patched could be anything from plaintext to an OpenDocument spreadsheet to a video clip to a DVD image of Ubuntu. I don't know whether any of those are good examples for profiling use, though.

If I understand your requirements correctly, something that might work fairly well is a BitTorrent partial file—any kind of file, transferred via BitTorrent—that's paused shortly before completion (90-95%) to get a snapshot, then resumed to get the final file. Then you produce an IPS or DPS diff of the partial and complete files, which should be mostly similar. To go back to my last example, the netboot disc image for Ubuntu 16.04 isn't too huge. I don't know whether 62MB is small enough to host directly on GitHub, but it might be.

Makefile uses CFLAGS and LDFLAGS already, and adding CXXFLAGS would indeed make sense. Not sure if CPPFLAGS is useful for anything, but sure, no reason not to. make.sh sets its own CFLAGS/LFLAGS, but you can call the Makefile yourself if you prefer that.

It uses them, yes, but I mean that I want to set environment variables, outside the makefile. As in:

CXXFLAGS="-march=native -pipe -flto -flto=jobserver" LDFLAGS="Wl,-O3,-z,relro,--as-needed,--hash-style=gnu,--relax,--gc-sections" make -j4

If I have to edit the makefile to set environment variables, then I have to discard my changes before I can run 'git pull'. 😉 If you could add a few declarations near the top, along the lines of 'CXXFLAGS += $(CXXFLAGS)' and so forth, that would be lovely.

rmenessec commented 6 years ago

Oh, one last request: can you enable -fPIE / -fPIC as appropriate? With the CXXFLAGS and LDFLAGS changes I tested, the resulting executable nearly passes the Ubuntu / Red Hat hardening tests:

$ ./make.sh ; echo ; file flips ; echo ; hardening-check flips
GTK+
g++ flips-cli.cpp flips-w32.cpp libbps-suf.cpp libips.cpp crc32.cpp flips.cpp libbps.cpp flips-gtk.cpp libups.cpp -std=c++98 -Wall -Werror -O3 -fomit-frame-pointer -fmerge-all-constants -fvisibility=hidden -fno-exceptions -fno-unwind-tables -fno-asynchronous-unwind-tables -ffunction-sections -fdata-sections -Wl,--gc-sections -O3 -fomit-frame-pointer -march=native -pipe -flto -flto=jobserver -fuse-linker-plugin -fpie -fweb -fwhole-program -findirect-inlining -ftree-switch-conversion -mpclmul -maes -mssse3 -mcx16 -mfpmath=sse -fstack-protector-strong --param=ssp-buffer-size=4 -w -D_FORTIFY_SOURCE=2  -fno-rtti -fno-exceptions -DNDEBUG -fopenmp -DFLIPS_GTK -I/usr/include/gtk-3.0 -I/usr/include/pango-1.0 -I/usr/include/glib-2.0 -I/usr/lib/x86_64-linux-gnu/glib-2.0/include -I/usr/include/cairo -I/usr/include/pixman-1 -I/usr/include/freetype2 -I/usr/include/libpng16 -I/usr/include/harfbuzz -I/usr/include/gdk-pixbuf-2.0 -I/usr/include/gio-unix-2.0/ -I/usr/include/mirclient -I/usr/include/mircore -I/usr/include/mircookie -I/usr/include/atk-1.0 -I/usr/include/at-spi2-atk/2.0 -I/usr/include/at-spi-2.0 -I/usr/include/dbus-1.0 -I/usr/lib/x86_64-linux-gnu/dbus-1.0/include  -pthread -lgtk-3 -lgdk-3 -lpangocairo-1.0 -lpango-1.0 -latk-1.0 -lcairo-gobject -lcairo -lgdk_pixbuf-2.0 -lgio-2.0 -lgobject-2.0 -lglib-2.0    -Ilibdivsufsort-2.0.1/include -DHAVE_CONFIG_H -D__STDC_FORMAT_MACROS libdivsufsort-2.0.1/lib/divsufsort.c libdivsufsort-2.0.1/lib/sssort.c libdivsufsort-2.0.1/lib/trsort.c   -oflips -Wl,-O3,-z,relro,-z,now,--as-needed,--hash-style=gnu,--relax,--gc-sections

flips: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=0801c85cc578b921097bc0098424544bd102b19d, stripped

flips:
 Position Independent Executable: no, normal executable!
 Stack protected: yes
 Fortify Source functions: yes (some protected functions found)
 Read-only relocations: yes
 Immediate binding: yes
Alcaro commented 6 years ago

17.04 LTS, DPS

I'll assume that's typos of 18.04 and BPS, respectively.

However, technically, you could use any type of source data, correct?

Yes, every file works, but (especially with IPS patches, or compressed formats like ZIP or PNG) some source/target combinations end up as big as the target itself. I'd rather not train the PGO on that.

Partial BitTorrent is a good start, but afaik, its buffering applies only in large (several KB) chunks, which wouldn't be representative of ROM hacks either.

But I have an idea that should work... I'll go experiment a bit.

It uses them, yes, but I mean that I want to set environment variables, outside the makefile.

$ CXXFLAGS=--version ./make.sh
rm: cannot remove ‘flips.exe’: No such file or directory
rm: cannot remove ‘floating.zip’: No such file or directory
rm: cannot remove ‘flips’: No such file or directory
rm: cannot remove ‘obj/*’: No such file or directory
GTK+ (1/3)
rm: cannot remove ‘flips’: No such file or directory
g++ crc32.cpp flips-cli.cpp flips.cpp flips-gtk.cpp flips-w32.cpp libbps.cpp libbps-suf.cpp libips.cpp libups.cpp -std=c++98  -Wall -Werror -O3 -s -flto -fomit-frame-pointer -fmerge-all-constants -fvisibility=hidden -fno-exceptions -fno-unwind-tables -fno-asynchronous-unwind-tables -ffunction-sections -fdata-sections -Wl,--gc-sections -fprofile-dir=obj/ -fprofile-generate --version -lgcov -fno-rtti -fno-exceptions -DNDEBUG -fopenmp -DFLIPS_GTK -pthread -I/usr/include/gtk-3.0 -I/usr/include/atk-1.0 -I/usr/include/at-spi2-atk/2.0 -I/usr/include/pango-1.0 -I/usr/include/gio-unix-2.0/ -I/usr/include/cairo -I/usr/include/gdk-pixbuf-2.0 -I/usr/include/glib-2.0 -I/usr/lib/x86_64-linux-gnu/glib-2.0/include -I/usr/include/harfbuzz -I/usr/include/freetype2 -I/usr/include/pixman-1 -I/usr/include/libpng12  -lgtk-3 -lgdk-3 -latk-1.0 -lgio-2.0 -lpangocairo-1.0 -lgdk_pixbuf-2.0 -lcairo-gobject -lpango-1.0 -lcairo -lgobject-2.0 -lglib-2.0    -Ilibdivsufsort-2.0.1/include -DHAVE_CONFIG_H -D__STDC_FORMAT_MACROS libdivsufsort-2.0.1/lib/divsufsort.c libdivsufsort-2.0.1/lib/sssort.c libdivsufsort-2.0.1/lib/trsort.c   -oflips
g++ (Ubuntu 4.8.5-2ubuntu1~14.04.1) 4.8.5
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Works for me. But yeah, make.sh shouldn't override CFLAGS and LFLAGS.

Wl,-O3,-z,relro,--as-needed,--hash-style=gnu,--relax,--gc-sections

-z,relro and --relax look like sensible optimizations, I'll add them to make.sh. From what I can gather, the others are either present already or applicable only to shared libraries.

CXXFLAGS += $(CXXFLAGS)

CXXFLAGS is already in CXXFLAGS. Are you sure that one would do anything useful?

Oh, one last request: can you enable -fPIE / -fPIC as appropriate?

From what I've heard, that yields a (small) slowdown. Users should be able to set it in CFLAGS/etc, but unless I'm wrong, I'd rather not enable it by default.

Alcaro commented 6 years ago

...okay, experiments done. The experiment was using some BMPs of the frames of some random GIF for profiling, then timing creation of a BPS of a 48MB SM64 romhack.

Profiled on ROMs: real 0m4.115s user 0m5.601s sys 0m0.036s real 0m4.118s user 0m5.585s sys 0m0.040s real 0m4.132s user 0m5.587s sys 0m0.044s real 0m4.194s user 0m5.666s sys 0m0.040s real 0m4.126s user 0m5.597s sys 0m0.028s

on BMPs: real 0m4.245s user 0m5.705s sys 0m0.056s real 0m4.139s user 0m5.651s sys 0m0.028s real 0m4.213s user 0m5.693s sys 0m0.056s real 0m4.159s user 0m5.660s sys 0m0.052s real 0m4.170s user 0m5.656s sys 0m0.032s

no profiling: real 0m4.422s user 0m5.892s sys 0m0.032s real 0m4.400s user 0m5.880s sys 0m0.032s real 0m4.417s user 0m5.865s sys 0m0.044s real 0m4.483s user 0m5.947s sys 0m0.028s real 0m4.434s user 0m5.887s sys 0m0.048s

Better than nothing, but not good enough.

rmenessec commented 6 years ago

17.04 LTS, DPS I'll assume that's typos of 18.04 and BPS, respectively.

Thinkos, more like. I was very under-slept yesterday, and I was trying to plan a Windows 10 migration, among other things, so "17.04" is a typo of "1703." 😉

CXXFLAGS += $(CXXFLAGS) CXXFLAGS is already in CXXFLAGS. Are you sure that one would do anything useful?

Works for me. But yeah, make.sh shouldn't override CFLAGS and LFLAGS.

I see the confusion. I used a bad example. The common approach that I'm used to is for the developer to import the shell variable and append or prepend. So, using your example of passing -version

CXXFLAGS += $(CXXFLAGS) -version

Now CXXFLAGS is set to whatever I passed—if anything—plus whatever flags you, the developer, intend g++ to always get when compiling Flips.

Wl,-O3,-z,relro,--as-needed,--hash-style=gnu,--relax,--gc-sections

This was simply an example I scraped from one of my handy bash aliases. Yes, '--as-needed' only applies to libraries; it's ignored when compiling a final executable. The full set for hardening and optimization for the flips executable should be something like:

-Wl,-O3,-z,relro,-z,now,--relax,--gc-sections,--pic-executable

I hadn't heard that position-independent executables are slower, but, if it's a concern, the simplest approach I've seen—assuming make is invoked directly—would be to have at least two alternate makefiles, with a symlink called Makefile pointing to whichever default you prefer, and the symlink name added to .gitignore.

Alternately, you could do something with environment variables and perhaps an (optional) include file. I'm not fully conversant with make, so I don't recall what approach or approaches might be used.

Better than nothing, but not good enough.

Enough better than nothing that I'd like to persuade you to consider providing the alternate approach for those of us who don't have your preferred profiling targets. I don't even know what source UPS patch is referenced on the first line of profile.sh, so I don't think I could find it even if I went looking...

By the way, I'd be interested in seeing the difference between no LTO, LTO on 5.3.1 (16.04), and LTO on 7.3.0 (18.04). LTO wasn't fully mature as of 5.x, I think. For my un-profiled build, my full CFLAGS / CXXFLAGS were set to:

-O3 -march=native -pipe -flto -flto=jobserver -fuse-linker-plugin -fweb -fstack-protector-strong --param=ssp-buffer-size=4 -D_FORTIFY_SOURCE=2
Alcaro commented 6 years ago

Now CXXFLAGS is set to whatever I passed—if anything—plus whatever flags you, the developer, intend g++ to always get when compiling Flips.

make ignores assignments to variables given with make CXXFLAGS=--foo, unless Makefile contains override CXXFLAGS += --bar.

I'd rather not stick override everywhere, so I put mandatory stuff (like -DFLIPS_GTK) in other variables, and pass both that and CXXFLAGS to g++.

and the symlink name added to .gitignore

If the file is in the repo already, .gitignore doesn't matter, changes to the file are committed anyways. If the file is not in the repo, it'd have to be created before the project is compilable, and I'd rather not add extra steps to the compilation process.

CXXFLAGS=-fpie ./make.sh (or CXXFLAGS=-fpie make or make CXXFLAGS=-fpie, if you don't like my optimization flags) should be enough for all plausible purposes.

I hadn't heard that position-independent executables are slower

https://ghc.haskell.org/trac/ghc/wiki/DynamicByDefault#Performanceofthedynamicway

Not much, but for this specific program, speed is uncompromisable.

Though I would prefer if I could find numbers for C or C++, rather than Haskell. Everything I can find about C/C++ is about x86_32, lacks numbers, is super old, or is otherwise not applicable.

I don't even know what source UPS patch is referenced on the first line of profile.sh, so I don't think I could find it even if I went looking...

Agreed, I added download links to profile/profile.sh earlier today. Could swear I did long ago, but apparently not.

Actually, I'll just go download the patches and zip them up. I'll also merge profile.sh and profile1.sh, that split is kinda pointless. I guess I didn't know how to make a function in sh back then.

The ideal solution is, of course, finding a not-ROMs-but-just-as-good profiling corpus, but if BMPs of a GIF don't look like ROMs, what data does?

Alcaro commented 6 years ago

should be enough for all plausible purposes

...okay, 'plausible' is subjective, and I may be underestimating what people consider plausible.

If your usecase isn't satisfied with CFLAGS=--foo ./make.sh, I'll make it include Makefile.local if it exists. If env vars are enough, I believe that's easier for everyone.

rmenessec commented 6 years ago

I'd rather not stick override everywhere, so I put mandatory stuff (like -DFLIPS_GTK) in other variables, and pass both that and CXXFLAGS to g++.

Oh. I may have misstated my case; most times I've been building something (else) from source, either I ended up completely rewriting a CFLAGS decl, or I was setting CFLAGS / CXXFLAGS / LDFLAGS before invoking a CMake or GNU Autoconf build, which seems to produce well-merged compiler and linker flags magically.

Obviously, that's not a good way to learn the eccentricities of make, or at least not GNU Make.

It might be worth considering a switch to to something like CMake. (Probably not Autoconf; I've heard, many times, although third-hand, that Autoconf is... not good.)

Obviously, I'm not the one who has to maintain the build process; just a thought. I mention it because my experience with shell code is that it only becomes more unwieldy the longer I have to manage it, and I usually end up asking myself why I'm not using something like Python in the first place. 😒

If this seems like a reasonable approach, I would definitely lean toward CMake in particular because it doesn't seem to have any dependencies that aren't already required by Flips. It accommodates custom build rules, like the profiling. It's cross-platform, too.

The ideal solution is, of course, finding a not-ROMs-but-just-as-good profiling corpus, but if BMPs of a GIF don't look like ROMs, what data does?

What I was going to suggest originally was selecting a few public domain ROMs, generating some junk patches for them, and including the patches and possibly even the ROMs (if not too large) in the repo. I don't think the resulting patched ROMs would need to be valid in any way. It seemed like more work than you might prefer, but it's the closest approximation of your present profiling targets that I can think of.

Otherwise, a clear identification of the profiling targets—both ROMs and patches—would be appreciated. Perhaps direct links to sites that both host the patches and identify the ROMs would be best?

The optimal solution would be something that doesn't have any proprietary dependencies—like the existing profiling targets—that could perhaps see Flips included in major "reference" distributions of Linux in the future. If Debian or Ubuntu include so much as a simple IPS patcher, even in the "multiverse" repositories, I can't name it. 😒

Alcaro commented 6 years ago

autoconf / CMake

CMake may not require anything other than itself, but it does require itself, which counts as dependency and complicating the build process.

So from that viewpoint, autoconf wins. But I'm not convinced either of them has anything to offer that my current scripts don't.

Yes, I'm super conservative sometimes.

it only becomes more unwieldy the longer I have to manage it, and I usually end up asking myself why I'm not using something like Python in the first place

Yes, big shell scripts suck, I don't envy whoever maintains autoconf's giant ./configure scripts. But thus far, mine are small enough.

What I was going to suggest originally was selecting a few public domain ROMs, generating some junk patches for them, and including the patches and possibly even the ROMs (if not too large) in the repo.

Public domain ROMs are ROMs, but I don't think they're big enough to get the profiler up to speed.

But I can relax one of my constraints: The IPS creator runs in O(n) with n <= 16MB, it's fast enough even without profiling.

Therefore, data that moves around works. For example, these blobs seem sufficiently multi-versioned, sufficiently similar and sufficiently dissimilar, and sufficiently large.

https://ftp.mozilla.org/pub/firefox/releases/45.0esr/linux-x86_64/en-US/firefox-45.0esr.tar.bz2 https://ftp.mozilla.org/pub/firefox/releases/52.0esr/linux-x86_64/en-US/firefox-52.0esr.tar.bz2

Results:

Profiled on ROMs: real 0m5.091s user 0m6.655s sys 0m0.149s real 0m5.017s user 0m6.566s sys 0m0.177s real 0m5.042s user 0m6.541s sys 0m0.157s real 0m5.058s user 0m6.598s sys 0m0.181s real 0m5.117s user 0m6.676s sys 0m0.160s real 0m4.983s user 0m6.561s sys 0m0.161s sum   30.308s

Profiled on Firefox: real 0m4.708s user 0m6.252s sys 0m0.150s real 0m4.800s user 0m6.365s sys 0m0.192s real 0m4.816s user 0m6.370s sys 0m0.200s real 0m4.858s user 0m6.420s sys 0m0.168s real 0m4.859s user 0m6.406s sys 0m0.196s real 0m4.867s user 0m6.436s sys 0m0.160s sum   28.908s

Conclusion: This is indeed a suitable replacement for the ROMs. Pushed, including a script to download them. I assume you have no objections to their license.

Otherwise, a clear identification of the profiling targets—both ROMs and patches—would be appreciated.

I did that last friday. But yeah, not needed anymore.

The optimal solution would be something that doesn't have any proprietary dependencies

It's not like profiling is required, it just speeds things up a bit.

that could perhaps see Flips included in major "reference" distributions of Linux in the future

Someone submitted Flips to Arch's AUR, does that count? (Actually, there's two of them. Not sure if that's a good or bad thing.)

If Debian or Ubuntu include so much as a simple IPS patcher, even in the "multiverse" repositories, I can't name it.

I can't name any Linux-compatible IPS patcher at all, other than Flips (and a few emulators' softpatching features). Lunar IPS and MultiPatch exist, but they're for Windows and OSX, respectively.

Before I made Flips, most Linuxers used LIPS in Wine.

rmenessec commented 6 years ago

Yes, I'm super conservative sometimes.

"I'm the developer" is usually good enough reason to reject a suggestion. You are and you have, and I'm fine with that. 😉

Conclusion: This is indeed a suitable replacement for the ROMs. Pushed, including a script to download them. I assume you have no objections to their license.

Wow. I am very glad you kept looking for profiling targets after I failed to get creative. No, of course no objections. Anything freely redistributable without limitation or fees would work. Clearly, these files (and most likely anything else on the Mozilla mirrors) fit that description.

Now I'm really excited to see if those numbers get better with very recent gcc packages, not to mention the other flags I intend to play with. Am I correct in thinking that you're using Ubuntu 16.04 for the benchmarking?

Someone submitted Flips to Arch's AUR, does that count?

Yes, of course it counts. I wasn't thinking of it, which is due to my lack of experience with Arch, which in turn is largely due to lack of time, and partly due to a sad lack of ambition on my part. I would like to see Flips in Debian, though, considering its reach and the sheer number of other, popular distros based on it. I've even seen a few embedded devices using Debian-stable; at least some of which allow both shell access and installation of official Debian packages...

(Actually, there's two of them. Not sure if that's a good or bad thing.)

I see a "flips" package and a "flips-git" package. The non-Git package is using version 1.31. Is that the current release version of Flips? That reminds me: do you plan to start doing release tags here on GitHub? It would be nice to see Flips fully at home here.

Alcaro commented 6 years ago

Am I correct in thinking that you're using Ubuntu 16.04 for the benchmarking?

Mint 17, aka Ubuntu 14.04, with GCC 4.8.5.

I should upgrade, but I'm lazy.

I would like to see Flips in Debian

Same here, but I'm not sure if pushing for that is worth the effort.

Maybe it is.

Not sure if that's a good or bad thing.

translation: If duplicates exist, the standards for inclusion must be low, so said inclusion may not necessarily be a valid measure of usefulness, user count, or anything else.

Is that the current release version of Flips?

Yes, 1.31 is the latest release.

That reminds me: do you plan to start doing release tags here on GitHub? It would be nice to see Flips fully at home here.

I'll try it at least once, if I ever deem any future version stable, and can remember (and find) it at that point.

No promises I'll tag future releases if the process annoys me.

rmenessec commented 6 years ago

One more request: would you add full hardening to make.sh as a non-default option?

My experimentation with adding -fpic / -fPIC (or -fpie / -fPIE) to the gcc flags and -pie to the ld flags suggests that I'm misremembering how to get a position-independent library and a final PIE correctly. To pass the rest of the checks, I'm merely adding -z,now to the ld flags and -fstack-protector-strong plus -D_FORTIFY_SOURCE=2 to the gcc flags. And, based on the gcc manpages, you may want to use -fuse-linker-plugin with LTO builds.

$ hardening-check /usr/local/bin/flips
/usr/local/bin/flips:
 Position Independent Executable: no, normal executable!
 Stack protected: yes
 Fortify Source functions: yes (some protected functions found)
 Read-only relocations: yes
 Immediate binding: yes

Oh, and—with the newer make.sh, the profile build works fine on one machine (headless, no GTK) and fails on another (GTK).

GTK+ (2/3)
Couldn't write patch. Are you on a read-only medium?
Couldn't write patch. Are you on a read-only medium?

I don't have time today to investigate further, although I should note that the machine it's failing on has only a 2GB /tmp because it's backed by tmpfs.

rmenessec commented 6 years ago

Thanks for all of your work and your help, by the way. 😉

Alcaro commented 6 years ago

One more request: would you add full hardening to make.sh as a non-default option?

With my usual optimization flags (including LTO and PGO), but no hardening:

real 0m4.089s user 0m5.501s sys 0m0.052s real 0m3.958s user 0m5.424s sys 0m0.068s real 0m3.984s user 0m5.471s sys 0m0.048s real 0m3.950s user 0m5.464s sys 0m0.024s real 0m3.933s user 0m5.435s sys 0m0.040s real 0m3.950s user 0m5.423s sys 0m0.040s sum   23.864              32.718

With optimization and -fstack-protector-all -Wstack-protector --param ssp-buffer-size=4 -pie -fPIE -ftrapv -D_FORTIFY_SOURCE=2 (this list minus the ones I have already, the warnings that spam 99999 lines of false positives, and -mmitigate-rop which requires GCC 6):

real 0m7.459s user 0m9.113s sys 0m0.060s real 0m7.467s user 0m9.137s sys 0m0.036s real 0m7.417s user 0m9.091s sys 0m0.036s real 0m7.483s user 0m9.144s sys 0m0.044s real 0m7.509s user 0m9.135s sys 0m0.032s real 0m7.517s user 0m9.139s sys 0m0.060s sum   44.852              54.759

With the above, minus -ftrapv:

real 0m3.992s user 0m5.442s sys 0m0.048s real 0m3.967s user 0m5.473s sys 0m0.024s real 0m3.925s user 0m5.445s sys 0m0.024s real 0m3.958s user 0m5.449s sys 0m0.044s real 0m4.000s user 0m5.442s sys 0m0.048s real 0m3.975s user 0m5.462s sys 0m0.048s sum   23.817              32.713

three guesses whether I'm keeping ftrapv, first two don't count.

Strange how the other flags don't seem to affect performance at all. It probably varies per platform, so I'd rather keep them off by default (and hardening only helps on shitty code, anyways), but it seems safe to make it an option.

And, based on the gcc manpages, you may want to use -fuse-linker-plugin with LTO builds.

md5sum on the binaries is identical. But yes, good idea anyways.

and fails on another (GTK)

It's failing to write the patch. But it doesn't need to write the patch; it only needs to get profiled, which it does.

But yes, showing that to the user is suboptimal.

Not gonna fix it properly, though. sed 's/.read-only./The patch was created successfully!/'

(Technical details: GTK's file writing functions create /.goutputstream-ghsdfjklghs, rather than the target filename; it's renamed to the real name once writing is done. The intention is that if the write aborts (due to power outage or something), it won't leave a half-written file around. But it doesn't really work if it can't create /dev/.goutputstream-kjsdghlfasdlhl.)

a 2GB /tmp

Doesn't matter, make.sh doesn't use /tmp at all (unless you cloned Flips to /tmp and ran out of space, but that'd give other errors). The output patch is sent to /dev/null (or errorland under the GTK front); the Firefox tarballs are downloaded to profile/firefox-42.0esr.tar, so they can be kept around for the next build.

Perhaps it would be a good idea to make it offer to download the tarballs to /tmp instead, but I can't imagine any usecase where you'd want that. Either you want to build Flips many times, in which case you want to keep the tarballs; or you only want to build Flips once, in which case you'll most likely delete the cloned repo once you're done, making the tarballs disappear as well.

rmenessec commented 6 years ago

Strange how the other flags don't seem to affect performance at all. It probably varies per platform, so I'd rather keep them off by default (and hardening only helps on shitty code, anyways), but it seems safe to make it an option.

Yes, it probably varies by platform. It may well vary based on CPU flags available (esp. if compiling with -march=native), and I would be startled if it didn't vary based on both compiler and compiler version. (Plus [g]libc, and/or libstdc++, I'm thinking.)

I'm only depending on the hardening flags to make it harder (pun intended) to make exploits useful on my systems—and, when I say "depending," I mean "hoping." Obviously, they won't do spit against some kinds of exploits. ... Although I once heard a trainer for an RHCE course claim, with a straight face, that everyone would have been protected against Heartbleed if only they'd had SELinux enabled. I was tempted to ask for details, but didn't.

Like I said, a non-default option is all I'm looking for. Thanks!

It's failing to write the patch. But it doesn't need to write the patch; it only needs to get profiled, which it does.

I thought you might say that. Good to know.

unless you cloned Flips to /tmp and ran out of space, but that'd give other errors

No, I'm unimaginatively using the old UNIX standby, /usr/src.

Perhaps it would be a good idea to make it offer to download the tarballs to /tmp instead, but I can't imagine any usecase where you'd want that. Either you want to build Flips many times, in which case you want to keep the tarballs; or you only want to build Flips once, in which case you'll most likely delete the cloned repo once you're done, making the tarballs disappear as well.

Speaking only for myself, I don't need the files written to /tmp, and I don't know of a reason to do it.

One last(?) buglet, though: I removed the Firefox ESR downloads for some reason (I think a connectivity problem that truncated one of the files), which caused the profiling step to fail silently. You may want to account for the case where the user agreed to the downloads, but they no longer exist for whatever reason. ...Or if they're not valid tarballs. (Maybe just use md5sum / sha1sum?)

Alcaro commented 6 years ago

Yes, it probably varies by platform.

I've heard PIC is way worse on x86_32 than x86_64. Haven't tried it, don't have a 32bit compiler installed.

Nor do I know about other platforms.

I'm only depending on the hardening flags to make it harder (pun intended)

that's exactly where the name comes from. Is it a pun if it's by design?

everyone would have been protected against Heartbleed if only they'd had SELinux enabled

Last time I checked, Heartbleed some kind of buffer overflow. SELinux, as a kernel module, has no way to detect userspace buffer overflows. libc would've had a chance, except OpenSSL did something silly with recycling allocations.

I too would like to hear said details.

One last(?) buglet, though: I removed the Firefox ESR downloads for some reason (I think a connectivity problem that truncated one of the files), which caused the profiling step to fail silently. You may want to account for the case where the user agreed to the downloads, but they no longer exist for whatever reason. ...Or if they're not valid tarballs. (Maybe just use md5sum / sha1sum?)

Oh right, good point.

Like this? md5 seems overkill as well, filesize is enough.

rmenessec commented 6 years ago

I've heard PIC is way worse on x86_32 than x86_64. Haven't tried it, don't have a 32bit compiler installed.

Do you mean i386, or are you referring to x32?

x32 seems to be abandoned, although I can't tell for certain. It's definitely very niche-y.

i386 is becoming rather niche-y itself. The only use I have for it, these days, is keeping i386 libs around so I can continue to play... well, almost all the PC games made, ever. There are still a surprising number of new releases—mostly indie—that are still only compiled against DirectX 9, not to mention i386. (I blame this in large part on software like Game Maker...)

Besides backwards compatibility, though, quite a lot of new games—virtually all "AAA" titles—are being released as 64-bit-only. 64-bit code has considerably to hugely better performance over 32-bit—regardless of platform; whether x86_64, ARMv8 / Aarch64, MIPS64, or other—for a lot of very popular applications that everyone uses, whether they know it or not.

Windows, macOS, and most popular Linux DEs now use one or several database engines under the hood for anything from contacts management to core OS functions. Windows has used an ugly fork of the Extensible Storage Engine—formerly "Exchange Storage Engine''—to provide some core CryptoAPI (or I may mean DPAPI) functionality since at least Windows XP. There's "desktop search" indexing, of course; pick an OS. KDE's Akonadi runs a small MySQL instance as a backend on every Linux distribution where I've looked for it, and apparently it requires a full-fat RDBMS in general.

Besides obscure applications like those, "multimedia" has become so ubiquitous that no one uses the word any more, and virtually everything in that category is best done on 64-bit, possibly with the help of various post-32-bit SIMD CPU extensions, whenever it can't be offloaded to a GPU. There are very few (if any) cryptography algorithms that don't go faster on 64-bit platforms, and virtually all the electronics I have that are more complex than a battery charger or a TV remote are constantly executing crypto algorithms. Speaking of security, ASLR is completely ineffective on 32-bit systems, and also on 32-bit apps running on 64-bit OSes. 😓

In short, backward compatibility is about the only reason to do anything with i386 in particular. 😉

I've done some work in embedded systems, and ARM SoCs are a better option than Intel SoCs if you need < 64-bit because you're trying to save power. ... Unless you have obscure or specialty needs, in which case you might look at Altera, Atmel, Synopsys / ARC and so forth.

Alcaro commented 6 years ago

Do you mean i386, or are you referring to x32?

Oh right, i386, yeah, that's the one. I somehow managed to forget its name.

PIC is expensive on i386 because global variables aren't at a known location, so you need shenanigans to access them. On x86_64, there's a PC-relative addressing mode that gets rid of the ugly parts (though it still turns 'mov eax, DWORD PTR var[rip]' into 'mov rax, QWORD PTR var@GOTPCREL[rip]; mov eax, DWORD PTR [rax]' - can't think of any valid reason except linker limitation, but that's probably just me being unimaginative).

x32 is just x64 with 32bit pointers; in particular, it still supports PC-relative, so PIC should still be cheap (but, due to that extra mov, nonzero).

i386 Windows does it differently because of course it does. It rewrites the actual code being executed to contain the right absolute addresses, using some shenanigans in the page fault routine, so truly position-independent code is just a waste of time.

Judging by the comments on that article, x64 Windows doesn't relocate the code, but it does relocate the data section if it contains stuff like 'int foo; int* bar = &foo;'. x64 Linux probably does the same thing, didn't check.

possibly with the help of various post-32-bit SIMD CPU extensions

hey, no hating the 32bit-compatible ones, I managed to speed up one of my programs by a factor 2 (2800->5800 fps) by tossing in some SSE2.

Trying to extend it to AVX2 didn't accomplish anything whatsoever.

(Though admittedly that's a pretty simple function. Nothing cryptographic, just a bitmap blitter.)

In short, backward compatibility is about the only reason to do anything with i386 in particular.

Reducing RAM use by reducing pointer size is also a valid usecase, I've heard of a few programs sticking with 32bit for that reason. Reducing pointer size also means more objects per cache line, which helps in a few rare cases. (I'm not sure which, nor whether they still prefer 32bit.)

But the reduced register count hurts performance a lot, so unless you're on Windows or something, x32 is a much better idea.

rmenessec commented 6 years ago

hey, no hating the 32bit-compatible ones, I managed to speed up one of my programs by a factor 2 (2800🡒5800 fps) by tossing in some SSE2.

Sorry, that was unclear. I'm not hating on SSE2, trust me. I can see the raid6 kmod selecting a mix of SSE2 and SSSE3 algorithms for xor() and gen() as my older laptop boots, because apparently the init-time benchmarks show those are the fastest possible for the box. The CPU's a Westmere part that doesn't have AVX or AVX2. 😉

I only intended to point out that x86 has continued adding useful SIMD instructions, and a lot of them imply the availability of x86-64 / "Intel 64" instructions on the same part. Conversely, a lot of useful extension sets (AVX / AVX2) don't exist at all on 32-bit-only parts that I know of. Perhaps some very old Atom parts that I wouldn't be caught dead using? 😉

I recall reading something recently about MMX having a surprising new(?) use for accelerating... something. I wish I could recall what it was.

Reducing RAM use by reducing pointer size is also a valid usecase

Okay, I can see that.

Alcaro commented 6 years ago

I only intended to point out that x86 has continued adding useful SIMD instructions

Yes, the newer ones exist, they should exist, they're useful for some programs, and I never intended to state otherwise. SSE2 is good enough for me (especially since, unlike SSE3, there are no x86_64 machines without SSE2, no need to worry about runtime dispatchers or binary portability), but for big projects like ffmpeg or Linux, more is better.

and a lot of them imply the availability of x86-64 / "Intel 64" instructions on the same part

Yes, you need a 64bit-compatible processor to use the newer ones. There are a few 32bit Atoms with SSSE3, but everything with SSE4 is 64bit.

But for some strange reason, you can use the new instructions even in 32bit mode.

I recall reading something recently about MMX having a surprising new(?) use for accelerating... something. I wish I could recall what it was.

I googled it, but couldn't find anything sensible. And from what I can see, MMX is just some simple integer operations, all of which exist as 128bit variants in SSE2.

I suspect it's something silly, like exploiting Meltdown. Or maybe you accidentally read a 20 year old article.

Alcaro commented 6 years ago

The issues were resolved long ago, and the random chatter ended a while ago too. No need for this to remain open.