Closed feranick closed 6 months ago
@namburger I tested it with a local fork of TF 2.17.0 with the neon_fully_connected_arm32.cc
reverted back to be in line with the other architectures, and compilation still fails even with the same error:
external/org_tensorflow/tensorflow/lite/kernels/internal/optimized/4bit/neon_fully_connected_arm32.cc: In function 'void tflite::optimized_4bit::NeonRunKernelNoSDot(const uint8_t*, const int8_t*, int32_t*, int, int, int, int, int, int) [with int RowsLeft = 4; int RowsRight = 1; int Cols = 32]':
external/org_tensorflow/tensorflow/lite/kernels/internal/optimized/4bit/neon_fully_connected_arm32.cc:192:7: error: 'asm' operand has impossible constraints
192 | asm volatile(KERNEL_4x1
| ^~~
INFO: Elapsed time: 34.950s, Critical Path: 17.60s
Note that this file (and the whole 4bit
folder) was introduced with TF 2.14)
So maybe the commit was intended to address this problem, but didn't...
@Namburger Disclaimer: I know close to nothing about asm. Yet, I tried to address this error for what seems like a limitation in the architecture. Following some comments from here, I replaced "r" with "g" so that in external/org_tensorflow/tensorflow/lite/kernels/internal/optimized/4bit/neon_fully_connected_arm32.cc
line 192 what is currently:
:
: [lhs_val] "r"(lhs_val), [rhs_val] "r"(rhs_val),
[element_ptr] "r"(element_ptr), [bit_shift] "r"(bit_shift),
[run_depth] "r"(run_depth)
is changed to:
:
: [lhs_val] "g"(lhs_val), [rhs_val] "g"(rhs_val),
[element_ptr] "g"(element_ptr), [bit_shift] "g"(bit_shift),
[run_depth] "g"(run_depth)
Compilation proceeds beyind this point, but it then stops with a similar error for another external library:
Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
In file included from external/ruy/ruy/pack_arm.cc:16:
external/ruy/ruy/pack_arm.h:492:9: warning: multi-line comment [-Wcomment]
492 | #endif // (RUY_PLATFORM_NEON_64 || RUY_PLATFORM_NEON_32) && \
| ^
external/ruy/ruy/pack_arm.cc: In function 'void ruy::Pack8bitColMajorForNeon4Cols(const ruy::PackParams8bit&)':
external/ruy/ruy/pack_arm.cc:264:3: error: 'asm' operand has impossible constraints
264 | asm volatile(
| ^~~
INFO: Elapsed time: 32.683s, Critical Path: 20.21s
Not encouraging.
hi @feranick with all of the changes, would you be able to send a set of commands to reproduce the very first issues? I believe it looks something like this? ( but may requires you to push some commits to your folk of tensorflow and then sync your fork of libedgetpu to it?)
git clone git@github.com:google-coral/libcoral.git && cd libcoral
git fetch origin pull/36/head:pull-36
git checkout pull-36
git submodule init && git submodule update
make DOCKER_IMAGE=debian:buster DOCKER_CPUS="armv7a" DOCKER_TARGETS=tests docker-build
I'm talking to some folks, trying to get an easy process to repro the original issues and continue on. It appears that element_ptr
needs to be a r+
and there could be more issues as you pointed out so we'll need to work this out one step at a time, unfortunately :(
A procedure to making code changes to tensorflow and test them against this build would be nice to have also, not sure if this can be easily done with bazel, though
One idea would be to fork TensorFlow, make changes in your own fork (and sync from time to time) and then change https://github.com/google-coral/libedgetpu/blob/b5820ad9437548febcf58f593a665541e7c5e561/workspace.bzl#L68-L76 to point to the fork and the last commit there.
Then, once it all works, you can make a PR from the fork back to TF and once that lands you can change the workspace.bzl
file to point back to upstream TF
@mihaimaruseac Thanks. This is effectively what I did for testing. An advantage of this is that ibedgetpu
could still be built on a stable version of TF (2.16.0, forked and with the visibility patch). I'd note that libedgetpu
can be used independently from libcoral/pycoral (in fact that is my way of using the edgeTPU). Yet it would allow to keep testing and building libcoral/pycoral, while libedgetpu would be "stable".
hi @feranick with all of the changes, would you be able to send a set of commands to reproduce the very first issues? I believe it looks something like this? ( but may requires you to push some commits to your folk of tensorflow and then sync your fork of libedgetpu to it?)
Hi @Namburger , here 's a slightly revised version of your commands (I can't clone via git@github.com:google-coral/libcoral.git, only via https). Also you should build using debian:bookworm as it has support for python3.11:
git clone https://github.com/google-coral/libcoral.git && cd libcoral
git fetch origin pull/36/head:pull-36
git checkout pull-36
git submodule init && git submodule update
make DOCKER_IMAGE=debian:bookworm DOCKER_CPUS="armv7a" DOCKER_TARGETS=tests docker-build
This commands will surely trigger the issue (just tested it). As for libedgetpu
it currently builds against TF 2.16.0-rc0, but the issue here happens regardless of the TF version libedgetpu is built against. Can change it to TF 2.17.0+visibility if needed (yet, see comment)
by mean of an update, the author of tensorflow/tensorflow@8419c70 promised to take a look sometimes this week!
by mean of an update, the author of tensorflow/tensorflow@8419c70 promised to take a look sometimes this week!
Hi @Namburger, just wondering whether there is an update on this... Thanks!
Joining in the fray here, wondering how the future for the Coral TPU is going to look. Many of us out here bought this with hopes of affordable facial and object recog for software like Frigate using the Coral devices....
To be honest with you, the development is pretty much dead. Google is happy to sell you the boards, ASUS will do so for bulk sales, and yet the platform is dead. Shameful if you are asking me. This his why I embarked in the effort of keeping it alive, and with some success to merge it with the main repo. But I also have a fork for the latest. I am pretty sure that apart from some googler with some time to spare, Google basically killed their dev team. So I would not be invested too much into the platform, unless you are willing to put the work in.
Luckily I only purchase a single pcie Coral. Unfortunately for me, I'm on a fixed income so the loss hurts just a smidge more than it would have if I was still at 6-figure income
This PR:
v2.15.0
), TF/crosstools, and libusb (1.0.26).This is in relation to issues: https://github.com/tensorflow/tensorflow/issues/62371 and https://github.com/google-coral/libedgetpu/issues/53