-
Any thoughts on how to test without spinning up a ton of ec2 instances? Posting a link to the demo on hn or reddit probably works at least once if you've got enough karma, but would be cool if there w…
-
**Describe the bug**
I have a 2.7B model checkpoint that was trained without any model parallelism, I now need to continue training using this checkpoint with a model parallelism of 2 using 4 GPUs. I…
-
```
# 1 month data have 20 instance
# another month might have 30 instance
# we record model prediction base on the N data
# micro average and macro
# micro over batches (6 class then find average of…
-
### What would you like to happen?
Add the following metrics definitions to RunInference documentation.
num_inferences
- The cumulative count of all samples being passed to RunInference. I.e. tot…
-
Let's define the checkpoint format for onert/onert-micro.
Here, the term checkpoints are any serialization format for resuming the training process afterward ( Note that checkpoint in keras is a se…
-
Dear @matang28,
I am searching for my [jobs executor](https://github.com/weldpua2008/supraworker/) some library to deliver commands output (`stdout` & `stderr`) to remote API or Redis.
I found y…
-
Dockerfile:
```txt
FROM huggingface/transformers-pytorch-gpu:latest
RUN pip install deepspeed -i https://pypi.tuna.tsinghua.edu.cn/simple
```
代码来源:
[https://github.com/microsoft/DeepSpeedExamp…
-
I noticed that new batches are triggered even if all shards are empty and `avoidEmptyBatches` is set to true. This leads to `ProvisionedThroughputExceededException`s (I don't understand why at the mom…
-
If not how can I do it with current connector? Any thoughts?
-
per https://discord.com/channels/1104757954588196865/1111279858136383509/1116644094484164609
SyLM — 06/09/2023 4:24 AM
Yeah, I did that
I wondered if there was a reason I was not aware of, and t…