-
Hi, I found no DIRTY flag is set in the following code:
include/page_cache.h: 477
```CUDA
T& operator[](const size_t i) {
if ((i < start) || (i >= end)) {
update_page(i);
}
…
-
**What is your question?**
I've learned example 27_ampere_3xtf32_fast_accurate_tensorop_gemm, which says the following:
> 1xTF32: FP32 in, converted to one TF32 internally, accumulated in FP32, FP…
-
This is the UnitTest
```python
import unittest
import torch
from aitemplate.compiler import compile_model, ops
from aitemplate.frontend import Tensor
from aitemplate.testing import detect_…
-
**Describe the bug**
For a problem with `--n=7 --h=1208 --w=1920 --c=4 --k=16 --r=4 --s=4 --stride_h=4 --stride_w=4`, DefaultConv2dFprop only takes 0.77ms, but DefaultConv2dFpropWithBroadcast takes 5…
-
In the eval_forecasting() function, the authors split the training, validation, and test sets after encoding all data as representation. Considering that the convolutional network is used to construct…
-
While I can appreciate why RISC-V V 1.0 can't select among mask registers per instruction, I am troubled because I can't understand the rationale behind the dual-use v0 scheme chosen in lieu of a dedi…
-
**I have the following code for GEMM on A100 and I need to add the following feature to the code**
**1. adding a stream instead of the default stream**
Is this correct?
`cutlass::Status status…
-
(disclaimer I work at INTC, and I am an ISPC fan/advocate)
Hey all, I have been experimenting with Highway for a few weeks now. Did you evaluate ISPC (https://ispc.github.io/) prior to developing …
-
### Background and motivation
Multiplication and division operators involving floats and vectors are currently available. At the same time, only two vectors can be added and subtracted. For example, …
-
C# itself allows this but it is "unreliable" due to how the GC works (https://stackoverflow.com/a/60948128)
> Important: you must not use the pointer once you've left the fixed scope; the pointer i…