-
## 🐛 Bug
Using the pytorch 1.7.1 conda binary, the same operation gives me the correct answer when using cuda 10.2, but gives all zeros when using cuda 11.0.
## To Reproduce
Steps to reproduc…
rmrao updated
3 years ago
-
# net forward code
def forward(self, X):
N = X.size()[0]
X = self.features(X) # extract features
X = X.view(N, 512, 1 ** 2)
X = torch.bmm(X, torch.transpose…
-
Hi, according to Eq.19 in the paper, linear transform gT and gS are conducted on the teacher and student, respectively, i.e., gT(t), gS(s).
But as for your codes, the teacher transform gT is appli…
-
I noticed that in the implementation of attention, torch.baddbmm and torch.bmm are used for Q @ K and QK @ V respectively. I wonder if fp8 tensor cores are not used for computation here and if the com…
-
Hi Julien, I am exploring mvMORPH to reconstruct the ancestral state with the best-fitted model. And I noticed the variance at the root node given by the "estim“ function is always zero. Whereas, vari…
DPANN updated
4 months ago
-
https://github.com/pytorch/torchtitan/pull/161/files#diff-80b04fce2b861d9470c6160853441793678ca13904dae2a9b8b7145f29cd017aR254
In principle, the issue is that the PP model code traced the non-F…
-
In the source code, the author calculates the cosine distance as follows.
sum_support = torch.sum(torch.pow(support_image, 2), 1)
support_manitude = sum_support.clamp(eps,…
-
1. layer_2_sims = F.softmax(sessions_represent.bmm(layer_2_current).squeeze(2) * 1.0/avg_distance, dim = 1). 这行代码对应的是哪个公式呢?特别是avg_distance.
WuYHH updated
2 years ago
-
## 🚀 Feature
There are many different matmul-like operations:
- mm
- mv
- bmm
- baddmm
- baddbmm
- addbmm
- matmul
- linear
Tensor extension writers have to worry about all of these oper…
-
In the source code, the author calculates the cosine distance as follows.
sum_support = torch.sum(torch.pow(support_image, 2), 1)
support_manitude = sum_support.clamp(eps, float("…