-
Hi.
I installed using release-4.2.
Hive uses s3Compatible.
```
apiVersion: metering.openshift.io/v1
kind: MeteringConfig
metadata:
name: "operator-metering"
spec:
disableOCPFeatures: tru…
-
I'm curious, do I have to use the linear layer respectively first before I input qkv to FlaskAttention? When I get the output from FlaskAttention, do I still need the linear layer?
I look forward t…
-
Starting Pretrained B_SGFormer Model SGFormer
Traceback (most recent call last):
File "trainval.py", line 135, in
main()
File "trainval.py", line 56, in main
model = build_model(cfg)…
-
感谢您的工作!在实验过程中,我发现了以下问题:
1. model.py中gpt2的加载报错
` try:
self.gpt2 = GPT2LMHeadModel.from_pretrained(gpt2_path)
logger.info('succeed to load pretrain gpt2 model')
…
-
Does Horovod support the following parallelism setup? => pipeline parallelism, but different stages of the pipeline have different number of data parallel ranks.
For example, consider a model which…
-
I want to continue-pretraining my custom model in another dataset, so i only change initial_checkpoint_dir in training.yaml with the latest-run checkpoint dir path, but seems like the model can't be l…
-
### Motivation
when do the w8a8 quantization in pytorch engine, I found that InternLM2 modeling like. It use self.attention, self.feed_forward...
```python
class InternLM2DecoderLayer(nn.Module)…
-
Hello team,
Please i need help to solve this issue, the test is failing:
python lm_inference_test.py --meliad_path=$MELIAD_PATH --data_path=$DATA
I0130 03:37:30.642391 139830854076224 nn_comp…
-
Hello,
I have been trying to execute the script “_exp/exps_Numbers.sh_” to reproduce the results for the MNIST-SVHN based Numbers dataset. However, I have been running into a few issues and would a…
-
sorry to bother, I see the distill_loss in distill.py as :
distill_loss = F.kl_div(
F.log_softmax(distill_logits / T, dim=-1),
F.softmax(teacher_logits / T, dim=…