LaurentMazare / tch-rs

Rust bindings for the C++ api of PyTorch.
Apache License 2.0
4.25k stars 334 forks source link

Windows Support #9

Closed paddyhoran closed 5 years ago

paddyhoran commented 5 years ago

Any objections to a PR adding windows support?

LaurentMazare commented 5 years ago

That would certainly be much appreciated (I wouldn't be able to test it though).

crackcomm commented 5 years ago

@LaurentMazare I think it actually does work.

I removed -Wl,-rpath flags from build script and basic example works just fine.

When I tried to compile reinforcement-learning or cifar example, I got this error:

libtch-f6fb773124c0e3c3.rlib(tch-f6fb773124c0e3c3.2d5yw0723k7kqxuq.rcgu.o) : error LNK2019: unresolved external symbol ato_rms_prop referenced in function _ZN3tch8wrappers9optimizer10COptimizer8rms_prop17h85fbf712274e8a21E
C:\Users\Pah\tch-rs\target\debug\examples\cifar-7657a28e1908fa9c.exe : fatal error LNK1120: 1 unresolved externals

Just commenting out references to rms_prop, RmsProp, ato_rms_prop made it compile:

     Running `target\debug\examples\reinforcement-learning.exe pg`
action space: 2
observation space: [4]
epoch: 0   episodes: 199   avg reward per episode: 25.20
[...]
epoch: 41  episodes: 27    avg reward per episode: 189.48

On the other hand a2c:

     Running `target\debug\examples\reinforcement-learning.exe a2c`
usage: main pg|a2c|a2c-sample|ppo|ppo-sample
usage: main pg|a2c|a2c-sample|ppo|ppo-sample
usage: main pg|a2c|a2c-sample|ppo|ppo-sample
[...]
usage: main pg|a2c|a2c-sample|ppo|ppo-sample
usage: main pg|a2c|a2c-sample|ppo|ppo-sample
usage: main pg|a2c|a2c-sample|ppo|ppo-sample

which might be a problem with conda, in PowerShell terminal (not activated virtualenv) error is different:

     Running `target\debug\examples\reinforcement-learning.exe a2c`
Error: PyErr { ptype: <class 'ImportError'>, pvalue: Some(ImportError('DLL load failed: The specified module could not be found.')), ptraceback: Some(<traceback object at 0x000001BDDB4B3F08>) }

Setting Py_ENABLE_SHARED didn't help. The pg example works in both terminals.

Thanks for your awesome work, I appreciate it a lot.

LaurentMazare commented 5 years ago

The rms-prop bit is quite strange, I've never seen anything like this so I would just suggest checking whether the _ZN3tch8wrappers9optimizer10COptimizer8rms_prop17h85fbf712274e8a21E symbol is properly defined in one of the libtorch dll that you installed (I imagine that you're using v1.1.0 as recommended, e.g. for the cpu version https://download.pytorch.org/libtorch/cpu/libtorch-win-shared-with-deps-v1.1.0.zip ). For a2c, you probably already installed the atari gym envs in your conda environment. Maybe you could check that they work well with python, e.g. for a2c you can try loading "SpaceInvadersNoFrameskip-v4". Sorry for not being more helpful, I don't have a windows box - I'll try to get access to one. If you find ways to fix this, happy to get a PR or some change suggestions.

crackcomm commented 5 years ago

I just downloaded the latest libtorch before https://download.pytorch.org/libtorch/cu100/libtorch-win-shared-with-deps-latest.zip I hope it is really the latest release, 142c973f4179e768164cd578951489e89021b29c.

I have gym installed but turns out: No module named 'atari_py'.

Installed atari_py, can render SpaceInvadersNoFrameskip-v4 from REPL but the output from reinforcement-learning example didn't change.

LaurentMazare commented 5 years ago

Just to mention that I gave this a try using windows subsystem for linux today and everything worked out of the box. If that's an option for you this is probably easier to set up.

Below are the commands that I ran for an ubuntu version.

# install rust
curl https://sh.rustup.rs -sSf | sh
# install a bunch of dependencies
sudo apt-get install gcc libssl-dev pkg-config python python3 python3-pip libpython3.6-dev
# install the python gym environment
pip install gym pillow gym[atari] --user

git clone https://github.com/LaurentMazare/tch-rs.git
cd tch-rs
cargo run --example reinforcement-learning --features=python a2c
crackcomm commented 5 years ago

On Windows machine, it was a Python 3.7.

I'm happy to help if there is anything you would like me to try (on Windows), I have no doubt it works on Linux ;). I'd just have too much struggle taking into account my past experiences with Linux and multiple monitors which is probably the single thing that keeps me running Windows.

vegapit commented 5 years ago

Not sure where we stand on this issue but I can confirm that the project can be compiled on Windows 10 by just:

The following warnings appeared during the torch-sys compilation:

warning: cl : Command line warning D9002 : ignoring unknown option '-std=c++11'
warning: cl : Command line warning D9002 : ignoring unknown option '-fPIC'

Then, in order to run the tests successfully, the last step is to:

All tests successfully ran afterwards.

LaurentMazare commented 5 years ago

It's nice that it's not too hard to get it work. I haven't done anything on this as I don't have a windows box but if anyone wants to craft a PR to add support for this, that would be a welcome addition.

vegapit commented 5 years ago

I am not familiar with the default Windows compiler at all. A flag adjustment in torch-sys/build.rs based on the running OS would work but there should be new flags to optimise the code adequately in Windows I would assume.

LaurentMazare commented 5 years ago

I feel that having flags that work reasonably well but are not very optimized could already be interesting. If I get my hands on a windows box at some point I may give it a try but that's not very likely to happen in the near future.

jerry73204 commented 5 years ago

@vegapit @crackcomm @paddyhoran My windows patch is merged hours ago. Could you see if it's reproducible on your machine? The instruction is left on README.

Note that the libtorch should be manually downloaded, and set env vars explicitly to make it work. It's because of the limitations that Windows does not support rpath or runpath. I think the procedure is not much harder than Linux.

crackcomm commented 5 years ago

@jerry73204 I cloned and cargo test passed. I tried basics example, fully functional.

Thanks for your effort, you did a great job.

vegapit commented 5 years ago

@vegapit @crackcomm @paddyhoran My windows patch is merged hours ago. Could you see if it's reproducible on your machine? The instruction is left on README.

Note that the libtorch should be manually downloaded, and set env vars explicitly to make it work. It's because of the limitations that Windows does not support rpath or runpath. I think the procedure is not much harder than Linux.

All tests on the repo ran successfully on Windows 10 with Rust 1.37. Neat work.

paddyhoran commented 5 years ago

@jerry73204 great work! Thank you!

LaurentMazare commented 5 years ago

Also kudos to @jean-airoldie for adding some windows CI support!