espruino / Espruino

The Espruino JavaScript interpreter - Official Repo
http://www.espruino.com/
Other
2.76k stars 741 forks source link

tensorflow - fulll source file paths included in release binary #2267

Closed fanoush closed 1 year ago

fanoush commented 2 years ago

When you check binaries for banglejs or banglejs2 for strings and grep for "home" it can be seen full path of some tensorflow sources is encoded many times there. Removing just the file paths can save 1KB and also makes the binary build more reproducible from same source.

$ strings espruino_2v14.92_banglejs2_app.bin | grep home
/home/runner/work/Espruino/Espruino/libs/tensorflow/tensorflow/lite/core/api/flatbuffer_conversions.cc
/home/runner/work/Espruino/Espruino/libs/tensorflow/tensorflow/lite/micro/micro_allocator.cc
/home/runner/work/Espruino/Espruino/libs/tensorflow/tensorflow/lite/kernels/kernel_util.cc
/home/runner/work/Espruino/Espruino/libs/tensorflow/tensorflow/lite/micro/kernels/dequantize.cc
/home/runner/work/Espruino/Espruino/libs/tensorflow/tensorflow/lite/micro/kernels/softmax.cc
/home/runner/work/Espruino/Espruino/libs/tensorflow/tensorflow/lite/micro/kernels/depthwise_conv.cc
/home/runner/work/Espruino/Espruino/libs/tensorflow/tensorflow/lite/micro/kernels/fully_connected.cc
/home/runner/work/Espruino/Espruino/libs/tensorflow/tensorflow/lite/micro/kernels/fully_connected.cc Hybrid models are not supported on TFLite Micro.
/home/runner/work/Espruino/Espruino/libs/tensorflow/tensorflow/lite/micro/kernels/conv.cc
/home/runner/work/Espruino/Espruino/libs/tensorflow/tensorflow/lite/micro/kernels/reshape.cc
/home/runner/work/Espruino/Espruino/libs/tensorflow/tensorflow/lite/micro/kernels/quantize.cc
$ strings espruino_2v14.92_banglejs2_app.bin | grep home | wc -c
1104

It is caused by using FILE in calls to TF_LITE_KERNEL_LOG here https://github.com/espruino/Espruino/blob/master/libs/tensorflow/tensorflow/lite/kernels/op_macros.h#L51 and few times here https://github.com/espruino/Espruino/blob/master/libs/tensorflow/tensorflow/lite/c/common.h#L191

This could be ifdef-ed based on NDEBUG or RELEASE value to have the %s:%d .. __FILE__, __LINE__, part removed from TF_LITE_KERNEL_LOG calls while still having some kind of error message for RELEASE build.

Or the logging can be removed completely by adding -DTF_LITE_STRIP_ERROR_STRINGS in make/misc/tensorflow.make as per https://github.com/espruino/Espruino/blob/master/libs/tensorflow/tensorflow/lite/c/common.h#L158

this saves 10KB


BANGLEJS
CODE: 126976 -> 468132 (340948 bytes)
CODE: 126976 -> 457444 (330260 bytes) with -DTF_LITE_STRIP_ERROR_STRINGS

BANGLEJS2
CODE: 155648 -> 629664 (472444 bytes)
CODE: 155648 -> 619024 (461804 bytes) with -DTF_LITE_STRIP_ERROR_STRINGS

I hit this while trying to make elf file that matches existing release build. Unfortunately even when putting same git commit of Espruino sources to same /home/runner/work/Espruino/Espruino path and provisioning with same compiler as seeing in travis build log (e.g. here https://github.com/espruino/Espruino/actions/runs/2739143892/jobs/4292803960) I got same size binary but some bytes are different here and there. Not sure why, when I do make clean and do same make again I get binary which is exactly the same.

gfwilliams commented 2 years ago

Thanks! That's interesting...

I think we need to make sure that actual error messages are still there as Espruino does report them, but if TF_LITE_STRIP_ERROR_STRINGS strips just the log messages then that'd be great.

At one point, I did actually try to make sure that the error messages didn't include the full path (at least for the rest of Espruino)... IIRC the FILE value comes straight from the command-line, which in turn comes from $(ROOT) and ROOT = $(CWD) in the Makefile

If ROOT could be set to a relative path then I think all the log strings would be relative too?

fanoush commented 2 years ago

If ROOT could be set to a relative path then I think all the log strings would be relative too?

Yes, that's what I found too, __FILE__ comes from the way gcc is called so making it relative could help. Someone even suggested to pipe it via standard input to get rid of it completely https://stackoverflow.com/questions/15688434/remove-embedded-source-filenames-from-shared-libraries/15688992#15688992 which is a bit too radical :-)

For C++ (which is tensorflow case) someone also found compile time static expression code to remove path and keep just filename - check https://stackoverflow.com/a/47020053 or see the last comment on the bottom. Or one could possibly call gcc and define extra define like gcc $file -DFILENAME=$(basename $file) and use that instead of __FILE__ or even try to redefine __FILE__ itself in the source to that value if it exists.

I think we need to make sure that actual error messages are still there as Espruino does report them, but if TF_LITE_STRIP_ERROR_STRINGS strips just the log messages then that'd be great.

Yes, as I understand it it still returns some error code value but removes the message.

However the message can be sometimes helpful? I am not tensorflow expert but maybe when you make some random model on PC that uses some unsupported features it could be helpful to see the error message string what exactly is wrong in the model?

However the filename is not useful in release build IMO so I'd simply delete the %s:%d .. __FILE__, __LINE__, part from the messages. That would generate smaller logging code when compared to making __FILE__ shorter.

EDIT: Also for GCC there is the __BASE_FILE__ https://stackoverflow.com/a/1706368 which does not contain the path(?)

fanoush commented 2 years ago

Tested __BASE_FILE__ and it still has whole path, also tested this compile time stuff and the whole string is still kept inside binary even if not printed so both are not a solution.

Building code below with g++ works and prints just main.c:10 even when building like g++ -Os ``pwd``/main.c however it still keeps whole string and __FILENAME__ just adds some constant offset to original string so it is printed from the middle.

#include <stdio.h>

constexpr const char* const BASENAME(const char* const str, const char* const lastslash = nullptr) {
    return *str ? BASENAME(str + 1, ((*str == '/' || *str == '\\') ? str + 1 : (nullptr==lastslash?str:lastslash))) : (nullptr==lastslash?str:lastslash);
}
constexpr const char*__FILENAME__=BASENAME(__FILE__);

int main(int argc,char *argv[]){
printf("%s:%d %s",__FILENAME__,__LINE__,"Hello world!\n");
}

The compile time stuff probably still works as designed because even when build without any -O the BASENAME method is not there and there is still just this precomputed constant offset.

fanoush commented 2 years ago

there is actually another gcc solution targeted for reproducible builds described here https://reproducible-builds.org/docs/build-path/

so something like

gcc `pwd`/main.c  -ffile-prefix-map=`pwd`=whatever

will change automatically one path prefix to another so __FILE__ becomes whatever/main.c

So in this tensorflow case -ffile-prefix-map=$(ROOT)= could strip the root prefix

gfwilliams commented 2 years ago

-ffile-prefix-map actually sounds perfect - so -ffile-prefix-map=$(ROOT)=Espruino added to build flags would probably sort things out really nicely?

fanoush commented 2 years ago

Well, yes, depends how much you want to save (e.g. for Bangle 1).

BTW empty string works fine too so "=Espruino" will cut /home/runner/work/Espruino from all those and something like Espruino/libs/tensorflow/tensorflow/lite/core/api/flatbuffer_conversions.cc will still stay. By -ffile-prefix-map=$(ROOT)= you will strip the second Espruino too (and still keep the libs/tensorflow/tensorflow/lite/core/api there).

So yes the reproducibility part is solved by this minimal solution. You still end with multiple longs strings inside Bangle 1 binary and some extra logging code bloat passing source file names and line numbers (which you maybe don't do for your own Espruino code for release builds?). Dont know how much room the Bangle1 build has nowadays.

gfwilliams commented 2 years ago

Yes, good point... My only concern about changes to the tensorflow logging are any changes to core Tensorflow are going to make updating the firmware tricker... But we do have a folder of patches so I guess it can just go in there.

So would you be willing to do a PR for this? -ffile-prefix-map and as you say removing file and line from https://github.com/espruino/Espruino/blob/master/libs/tensorflow/tensorflow/lite/c/common.h ?

I realise I didn't reply above, but yes - tensorflow error messages are extremely useful. It's pretty often you might have some tensorflow model that doesn't load for some reason and you need to know why (eg what operation it was using that isn't there)

fanoush commented 1 year ago

WIll do a PR, so the changes for removing file name and line should be only as patch file or both applied in source and as a patch?

gfwilliams commented 1 year ago

source and patch I think - thanks!

fanoush commented 1 year ago

resolved by https://github.com/espruino/Espruino/pull/2270