borglab / SwiftFusion

Apache License 2.0
115 stars 13 forks source link

Add tensorflow/swift-apis as a SwiftPM dependency. #260

Open dan-zheng opened 3 years ago

dan-zheng commented 3 years ago

Motivation

This enables building SwiftFusion using stock toolchains from swift.org/download.

swift build will clone and build tensorflow/swift-apis as a regular SwiftPM dependency. Eventually, we would like to stop releasing custom toolchains bundled with pre-installed tensorflow/swift-apis.

Build instructions

It is possible to build tensorflow/swift-apis and dependencies like SwiftFusion using stock toolchains by installing pre-built X10 libraries (currently available only for macOS and Windows).

After installing (e.g. to $HOME/Library on macOS), build with SwiftPM via the following:

$ swift build -Xcc -I$HOME/Library/tensorflow-2.4.0/usr/include -Xlinker -L$HOME/Library/tensorflow-2.4.0/usr/lib -Xswiftc -DTENSORFLOW_USE_STANDARD_TOOLCHAIN

swift test is known not to work on macOS for tensorflow/swift-apis and dependencies due to SR-14008: Library not loaded: /usr/lib/swift/libswift_Differentiation.dylib.

Testing

Before merging, let's verify that swift build, swift run, and swift test works for swift.org/download toolchains across platforms, and update GitHub Actions CI so that it passes:

```console $ swift run Pose3SLAMG2O -Xcc -I$HOME/Library/tensorflow-2.4.0/usr/include -Xlinker -L$HOME/Library/tensorflow-2.4.0/usr/lib -Xswiftc -DTENSORFLOW_USE_STANDARD_TOOLCHAIN ... Everything is already up-to-date dyld: Library not loaded: /usr/lib/swift/libswift_Differentiation.dylib Referenced from: /Users/danielzheng/SwiftFusion/.build/x86_64-apple-macosx/debug/Pose3SLAMG2O Reason: image not found [1] 79788 abort swift run Pose3SLAMG2O -Xcc -I$HOME/Library/tensorflow-2.4.0/usr/include ``` ```console $ swift test -Xcc -I$HOME/Library/tensorflow-2.4.0/usr/include -Xlinker -L$HOME/Library/tensorflow-2.4.0/usr/lib -Xswiftc -DTENSORFLOW_USE_STANDARD_TOOLCHAIN ... Everything is already up-to-date 2021-01-08 07:14:48.425 xctest[79757:2116295] The bundle “SwiftFusionPackageTests.xctest” couldn’t be loaded because it is damaged or missing necessary resources. Try reinstalling the bundle. 2021-01-08 07:14:48.425 xctest[79757:2116295] (dlopen_preflight(/Users/danielzheng/SwiftFusion/.build/x86_64-apple-macosx/debug/SwiftFusionPackageTests.xctest/Contents/MacOS/SwiftFusionPackageTests): Library not loaded: /usr/lib/swift/libswift_Differentiation.dylib Referenced from: /Users/danielzheng/SwiftFusion/.build/x86_64-apple-macosx/debug/SwiftFusionPackageTests.xctest/Contents/MacOS/SwiftFusionPackageTests Reason: image not found) ```

ProfFan commented 3 years ago

Thank you very much Dan! I just tried compiling this with latest Swift nightly, and this (https://gist.github.com/ProfFan/638f61aff223bfcbea94b2ddb026497a) is what I've got. There is one compiler crash, and a lot of errors related to ElementaryFunction being not exist.

ProfFan commented 3 years ago

I have got past the ElementaryFunctions issue with swift build -Xswiftc -DTENSORFLOW_USE_STANDARD_TOOLCHAIN -Xcc -I/usr/include/tensorflow. Now the problem becomes the non-existence of libx10

swift build -Xswiftc -DTENSORFLOW_USE_STANDARD_TOOLCHAIN -Xcc -I/usr/include/tensorflow
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
/usr/bin/ld.gold: error: cannot find -lx10
clang-10: error: linker command failed with exit code 1 (use -v to see invocation)
<unknown>:0: error: link command failed with exit code 1 (use -v to see invocation)
[0/16] Linking libTensorFlow.so
ProfFan commented 3 years ago

Actually it's more than this, it appears that somehow _NumericShims is built but not linked b/c SPM ended the compilation prematurely. No idea what is happening.

ProfFan commented 3 years ago

Ok I see the problem. X10 needs to be built separately, but is it able to build x10 with an existing tensorflow install? or is it required to use the TF source? Will these two coexist? @BradLarson Could you help me debug this? Thanks a lot!

dan-zheng commented 3 years ago

Hi Fan,

Did you follow "build instructions" above and install pre-built X10 libraries? I believe they're currently available only for macOS and Windows – not Linux unfortunately.

The instructions for "building libraries depending on tensorflow/swift-apis" comes from this documentation. An alternative to using pre-built X10 libraries is to build them yourself, which should work just fine on Linux using swift.org/download toolchains.

Let me know if you need any help! I'm happy to video call if you'd like.

ProfFan commented 3 years ago

Hi Dan,

Thanks for the instructions! I have checked the building instructions and wonder if x10 can be built with a system-packaged tensorflow with headers? I think this is a very important question, as if x10 can be built separately then there will be a much higher chance that it will survive TF updates.

dan-zheng commented 3 years ago

Thanks for the instructions! I have checked the building instructions and wonder if x10 can be built with a system-packaged tensorflow with headers? I think this is a very important question, as if x10 can be built separately then there will be a much higher chance that it will survive TF updates.

Sure thing! I believe @compnerd can provide a more accurate answer to your question about system-packed TensorFlow and X10. I recall discussing such things before - using a system package manager seems more heavyweight and platform-specific, but maybe it's more robust against breakages as you suggest.

BradLarson commented 3 years ago

@ProfFan - When building a Swift for TensorFlow toolchain from scratch, X10 and TensorFlow are built from a specified TensorFlow version, and you have to manually move that version up to build against a new version of TensorFlow. In the worst case, you can still build these libraries as part of building a stock toolchain + swift-apis from scratch.

I don't know if these steps are documented anywhere, so I'll write down the sequence of commands needed to create a new toolchain based on the stock Swift compiler from scratch:

export TF_NEED_CUDA=1

mkdir swift-source
cd swift-source
git clone https://github.com/apple/swift.git
./swift/utils/update-checkout --clone --skip-repo swift
./swift/utils/build-toolchain buildbot_linux
git clone https://github.com/tensorflow/swift-apis.git

cmake -B BinaryCache -D BUILD_SHARED_LIBS=YES -D CMAKE_BUILD_TYPE=Release -D CMAKE_INSTALL_PREFIX=/media/nvidia/Data/Development/swift-source/swift-nightly-install/usr -D CMAKE_Swift_COMPILER=/media/nvidia/Data/Development/swift-source/swift-nightly-install/usr/bin/swiftc -D TENSORFLOW_USE_STANDARD_TOOLCHAIN=YES -G Ninja -S ./swift-apis

cmake --build BinaryCache --target install

tar -czf swift-tensorflow-stock-Jetson.tar.gz -C swift-nightly-install/ usr

(you may need to alter a few of the hardcoded paths above, this was a quick copy-paste)

For a Jetson build, you also need to add the following at the beginning to specify CUDA architectures:

export TF_CUDA_COMPUTE_CAPABILITIES=compute_53,compute_62,compute_72

In the process of building this, all headers and binaries are generated for X10 and TensorFlow. I can extract and package these for Ubuntu, based on our 0.13 toolchains. That should contain everything you'd need to build swift-apis as a package, and would serve as long as you didn't need to advance beyond TensorFlow 2.4.0. Would that be useful to have? If so, which Ubuntu configurations would be most useful to focus on?

BradLarson commented 3 years ago

OK, I tried it out and I think my idea of extracting the binary libraries from the completed toolchains will work. This is a version of the X10 standalone libraries (with TensorFlow headers) that builds on Ubuntu 18.04, CPU-only, with Dan's setup here. You might need to find the right Swift toolchain to use, however, because the zeroTangentVector changes upstream look like they might cause problems here.

If you want me to, I can create X10 snapshots from all of our Ubuntu variants and add them to the Windows and macOS snapshots linked on our development page.

ProfFan commented 3 years ago

@BradLarson Thanks a lot Brad! One last question - is it possible to build X10 with only the TF headers in a vendor install of TF? For example Arch Linux ships TF with full headers as a prebuilt package. In my experiments the X10 cmake seems to be always cloning from GitHub the full source tree.

ProfFan commented 3 years ago

But you are right, since Swift lives in a prefix we can definitely ship the TF libraries with the toolchain (separate from system TF) as well.

BradLarson commented 3 years ago

@ProfFan - I don't believe that libx10 can be built without access to the TensorFlow source, due to its need to compile in elements of XLA. Not entirely sure if the same is true for our eager-mode access, but I believe we build that in, too. Our toolchains exist independently of the system-installed TensorFlow, as does a binary library package like the one I linked above, and don't make use of it if it is available. Our TensorFlow support is pretty much standalone.

ProfFan commented 3 years ago

@BradLarson Thanks for the explanation! That is totally good :)

BradLarson commented 3 years ago

I've created both CUDA 11 and CPU-only Ubuntu 18.04 X10 packages and linked them here: https://github.com/tensorflow/swift-apis/pull/1182 . I figured those would be the two most popular platforms for people carrying this on in the near term, but can add others if needed.