bmm Search Results - Githubissues

1000+ results
for bmm

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

intel/intel-extension-for-pytorch #433

aten::bmm op is much more slower in float16 for llm rest tok…

### Describe the bug Hi all, when I do some profile with open-llama-3b on Arc A770, I found in float16, aten::bmm becomes extramely slower compared to float32 (111.4ms vs 22.5ms). I wonder is this …

rnwang04 updated 11 months ago
5
DriveNetTESTDRIVE/DriveNet #3

Sidechain deposits not recognized by sidechain client

After depositing from the mainchain via the sidechains tab to the sidechain address shown in the mainchain>transfer tab the deposit does not get available/visible at all in the sidechain client, even …

MerlinB updated 5 years ago
1
wanglixilinx/DSRL #9

about similarity matrix

I try to realise the FA loss after your answers。But I met some questions in relation graph 。 my test code is x = np.random.random((256, 64, 64)) y = np.random.random((256, 64, 64)) y = torch.from…

RUC-wly updated 4 years ago
3
pytorch/pytorch #127062

T5 -small Dynamic quantization in graviton3

I am trying dynamic quantization for Hugging face T5-small model in graviton3 .I have used ``` torch.quantization.quantize_dynamic(model, qconfig_spec={torch.nn.Linear}, dtype=torch.qint8) ``` In…

akote123 updated 4 months ago
9
j96w/DenseFusion #162

Code problem

First question: Don't understand why points are added below： `pred = torch.add(torch.bmm(model_points, base), points + pred_t)` The second question is about iterative optimization. Why is the follow…

zuoligang1997 updated 4 years ago
1
harvardnlp/genbmm #6

[feature request] support log-bmm to context-free grammars

I found log-bmm very useful for linear-chain CRF to save memory and speed up, while in context-free grammars, A->BC requires amounts of GPU memories, which is more serious. So it is difficult to incre…

sustcsonglin updated 4 years ago
22
FMInference/FlexiGen #131

How do I match the results of profiling with the parameters …

The output of profile bandwidth is as follows： size: 0.25 MB, gpu-to-cpu bandwidth: 5.505 GB/s size: 32.00 MB, gpu-to-cpu bandwidth: 13.220 GB/s size: 128.00 MB, gpu-to-cpu bandwidth: 13.324 GB/…

xvanQ updated 5 months ago
1
JCruan519/VM-UNet #20

当我改变gpu_ids=[1,2,3]时，报错（是否不支持多卡运行？）

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:2 and cuda:1! (when checking argument for argument mat2 in method wrapper_CUDA_bmm)

ShChen233 updated 6 months ago
1
tenstorrent/tt-metal #6984

[N300] TTNN Unit Test Failures: Compute Grid sizes

Failure: ``` - RuntimeError: TT_FATAL @ tt_eager/tt_dnn/op_library/sharded/sharded_op.cpp:42: this->grid_size.x grid_size.y

cfjchu updated 6 months ago
9
juho-lee/set_transformer #7

4-D equivalent?

What if I have a set of matrices instead of a set of vectors? Is it possible to extend the Set Transformer framework to cover that scenario? I played around with it a little (including making some …

zabzug-pfpt updated 2 years ago
3

上一页 1...4 5 6 7 8 9 10...100 下一页

1000+ results for bmm

1000+ results
for bmm