-
Hi I'm using the `get_cosine_schedule_with_warmup` function from torchtune and according to my LR monitor the first epoch is training with LR=0. https://github.com/pytorch/torchtune/blob/main/torchtun…
-
Hi! Is it possible to monitor "epoch-by-epoch" the training progress of a given algorithm (neural nets, particularly)?
Thanks a lot in advance
-
For model training and execution, several design patterns are effective in managing workflows, code structure, and flexibility. Here are a few common ones used in machine learning and data processing …
-
### **Issue Title**: Training process fails due to Out of Memory (OOM) error
#### **Description**
I encountered an **Out of Memory (OOM)** issue while running a training script on my AlmaLinux 9 s…
-
Hi,
Thank you for creating this really interesting project. I'm eager to use it in my work. One thing that would be really useful is if there was some way to easily examine training history to actu…
-
### Feature request
Right now `include_tokens_per_second=True` in `Trainer` only reports the tokens per second metric at [the end of training](https://github.com/huggingface/transformers/blob/c175343…
-
# **AI-Powered Image Processing for Monitoring Counter Services**
AI system will analyze real-time CCTV footage to detect various events such as customer congestion, idle counters, long waiting times…
-
Hello!
Is there a plan to implement monitoring of the head training phase as well?
Currently it seems that logging (Wandb, neptune etc.) stops after the body training, but it can be useful to see th…
-
Automated Pipeline for Autoencoder Models on GitHub
1. Data Preprocessing Automation
What:
Implement a CI/CD workflow that automatically processes and augments datasets when new data is available. T…
-
**[ @mmguero](https://github.com/mmguero)** cloned issue [idaholab/Malcolm#353](https://github.com/idaholab/Malcolm/issues/353) on 2024-01-15:
> **For what topic would you like to see training devel…