The goal of this code is to provide Fortran HPC codes with a simple way to use Pytorch deep learning framework. We want Fortran developers to take advantage of rich and optimized Torch ecosystem from within their existing codes. The code is very much work-in-progress right now and any feedback or bug reports are welcome.
To assist with the build, we provide the Docker and HPCCM recipe for the container with all the necessary dependencies installed, see container
You'll need to mount a folder with the cloned repository into the container, cd into this folder from the running container and execute ./make_nvhpc.sh
, ./make_gcc.sh
or ./make_intel.sh
depending on the compiler you want to use.
To enable the GPU support, you'll need the NVIDIA HPC SDK build. GNU compiler is ramping up its OpenACC implementation, and soon may also be supported.
Changing the compiler is possible by modifying CMAKE_Fortran_COMPILER
cmake flag. Note that we are still working on testing different compilers, so issues are possible.
examples folder contains three samples:
setup-model.py
scripts in the corresponding example folder that would define the model and store in on the disk.
With the models saved and ready, run the following:
cd /path/to/repository/
install/bin/resnet_forward ../examples/resnet_forward/traced_model.pt
install/bin/polynomial ../examples/polynomial/traced_model.pt ../examples/polynomial/your_new_trained_model.pt
install/bin/python_training ../examples/python_training/model.py
Keep in mind that order of the array dimensions is different in Fortran and C/Pytorch. Fortran's contiguous dimension is the first one, while in Pytorch the contiguous dimension is the last one. Therefore, in order for the Fortran input to match the Pytorch expectation, the order of the Fortran input array dimensions must be the inverse of the Pytorch input tensor. The library will take care of correctly matching the dimensions in this case without any data movement.
We are working on documenting the full API. Please refer to the examples for more details. The bindings are provided through the following Fortran classes:
torch_tensor
This class represents a light-weight Pytorch representation of a Fortran array. It does not own the data and only keeps the respective pointer.
Supported arrays of ranks up to 7 and datatypes real32
, real64
, int32
, int64
.
Members:
from_array(Fortran array or pointer :: array)
: create the tensor representation of a Fortran array.to_array(pointer :: array)
: create a Fortran pointer from the tensor. This API should be used to convert the returning data of a Pytorch model to the Fortran array.torch_tensor_wrap
This class wraps a few tensors or scalars that can be passed as input into Pytorch models.
Arrays and scalars must be of types real32
, real64
, int32
or int64
.
Members:
add_scalar(scalar)
: add the scalar value into the wrapper.add_tensor(torch_tensor :: tensor)
: add the tensor into the wrapper.add_array(Fortran array or pointe :: array)
: create the tensor representation of a Fortran array and add it into the wrapper.torch_module
This class represents the traced Pytorch model, typically a result of torch.jit.trace
or torch.jit.script
call from your Python script. This class in not thread-safe. For multi-threaded inference either create a threaded Pytorch model, or use a torch_module
instance per thread (the latter could be less efficient).
Members:
load( character(*) :: filename, integer :: flags)
: load the module from a file. Flag can be set to module_use_device
to enable the GPU processing.forward(torch_tensor_wrap :: inputs, torch_tensor :: output, integer :: flags)
: run the inference with Pytorch. The tensors and scalars from the inputs
will be passed into Pytorch and the output
will contain the result. flags
is unused nowcreate_optimizer_sgd(real :: learning_rate)
: create an SGD optimizer to use in the following training train(torch_tensor_wrap :: inputs, torch_tensor :: target, real :: loss)
: perform a single training step where target
is the target result and loss
is the L2 squared loss returned by the optimizersave(character(*) :: filename)
: save the trained modeltorch_pymodule
This class represents the Pytorch Python script and required the interpreter to be called. Only one torch_pymodule
can be opened at a time due to the Python interpreter limitation. Overheads calling this class are higher than with torch_module
, but contrary to the torch_module%train
one can now train their Pytorch model with any optimizer, dropouts, etc. The intended usage of this class is to run online training with a complex pipeline that cannot be expressed as TorchScript.
Members:
load( character(*) :: filename)
: load the module from a Python scriptforward(torch_tensor_wrap :: inputs, torch_tensor :: output)
: execute ftn_pytorch_forward
function from the Python script. The function is expected to accept tensors and scalars and returns one tensor. The tensors and scalars from the inputs
will be passed as argument and the output
will contain the result.train(torch_tensor_wrap :: inputs, torch_tensor :: target, real :: loss)
: execute ftn_pytorch_train
function from the Python script. The function is expected to accept tensors and scalars (with the last argument required to be the target tensor) and returns a tuple of bool is_completed
and float loss
. is_completed
is returned as a result of the train
function, and loss
is set accordingly to the Python output. is_completed
is meant to signify that the training is completed due to any stopping criterion save(character(*) :: filename)
: save the trained modeltarget
attribute and resnet_forward
crash with GNUcontainer.py
to work with more recent compilersfatal error: torch/csrc/generic/Storage.h: No such file or directory
main
branch instead of vX.X
and we will use tags insteadforward
and train
routines now accept torch_tensor_wrap
instead of just torch_tensor
. This allows a user to add multiple inputs consisting of tensors of different size and scalar values