Oxen-AI / mamba-dive

This is the code that went into our practical dive using mamba as information extraction
50 stars 7 forks source link

zero loss when training #2

Open jcrangel opened 7 months ago

jcrangel commented 7 months ago

I have executed

python train_mamba_with_context.py --model state-spaces/mamba-130m \
   --data_path data/Mamba-Fine-Tune/squad_train.jsonl \
   --output models/mamba-130m-context \
   --num_epochs 10

But soon after it goes to zero:

{'loss': 2.9325, 'learning_rate': 0.0004995433789954337, 'epoch': 0.01}                                            
{'loss': 0.0, 'learning_rate': 0.0004990867579908676, 'epoch': 0.02}                                               
{'loss': 0.0, 'learning_rate': 0.0004986301369863013, 'epoch': 0.03}                                               
{'loss': 0.0, 'learning_rate': 0.0004981735159817352, 'epoch': 0.04}    

Then the model does not train. I have experimented with smaller lr, with the same result.

lzw-lzw commented 6 months ago

Hi, I also encountered the same problem. Have you found a solution? Thank you.