-
As a person who works a lot with recurrent networks and sequences, I wish it was easier to work with `PackedSequence`. I frequently find myself packing/unpacking sequence to perform some simple operat…
-
# Single-Device-Abstract DDP
## Motivation
In current PyTorch DDP, when training a model with Dropout operations, the final results obtained from distributed training will not be consistent with t…
-
This issue is to facilitate discussion of inplace handling, namely the "big" solution of having a static single assignment (SSA) representation.
For any handling of inplace, we want to make certain…
t-vi updated
5 months ago
-
### Describe the issue
ONNX opset 14 supports Add and Sub for int8, int16, uint8, and uint16, but these do not seem to be supported by the default CPU provider in ONNX Runtime.
I have created a …
-
# Training Performance Tuning
The goal is to decrease the training time by taking advantage of available GPU. The following [Performance Tuning Guide](https://pytorch.org/tutorials/recipes/recipes/…
-
Reimplement upsample as trace_only and improve accuracy. In particular, check `aten.upsample_trilinear3d.vec` is accurate.
Related: https://github.com/microsoft/onnxscript/issues/1159
-
### Description
Streaming executor does not preserve order by default. There is a global flag for preserving order.
However, the decision to preserve order should be done at the operator level, no…
-
### Describe the issue
Doing PyTorch/Numpy operations on tensors obtained by `InferenceSession.run()` is **50x slower** than doing these operations from dummy inputs.
Doing `time.sleep()` after …
-
## Problem
onnx_legalizer.py code is hard to understand, need to improve it's readability.
## What to do
- [x] replace general `transformer.make_node` method with specialized methods, like `m…
-
### 🐛 Describe the bug
```
#0 0x00007fb5a545ae50 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#1 0x00007fb5a51aa479 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2 0x00007fb5a55…