AI-Hypercomputer maxtext issues

AI-Hypercomputer / maxtext

A simple, performant and scalable Jax LLM!

Apache License 2.0

1.47k stars 275 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Enable expert parallelism for dropping strategy

#869 RissyRan closed 2 weeks ago
0
Unable to recover after checkpoint saving

#868 peregilk opened 2 weeks ago
2
Make running preflight optional in model scripts

#867 raymondzouu closed 2 weeks ago
0
test code to produce Lab Notes - 2024-09-07.ipynb

#866 bernardhan33 opened 2 weeks ago
0
Cannot see multiple GPUs when using Slurm (with proposed fix)

#865 gabeweisz opened 2 weeks ago
0
Converting LLama3.1 405B checkpoint - Requesting multipass checkpoint conversion

#864 shivajid closed 1 week ago
3
Add MaxText run name to TensorBoard file directory

#863 bvandermoon closed 3 weeks ago
0
Improve tfds perf in multihost env

#862 aireenmei closed 2 weeks ago
0
Fix circ storage check for delayed case

#861 gobbleturk closed 2 weeks ago
0
Add load balance loss

#860 RissyRan closed 3 weeks ago
0
RA update works for all axes orders

#859 patemotter closed 3 weeks ago
0
Add simple MLP decoder block

#858 gobbleturk closed 3 weeks ago
0
Delay Activation Forwarding

#857 gobbleturk closed 3 weeks ago
1
added run_name_prefix to tensorboard

#856 kyle-google closed 2 weeks ago
1
Temporarily pin google-cloud-aiplatform to 1.61.0

#855 bvandermoon closed 3 weeks ago
0
[DRAFT] Add In Memory Changes for Pathways

#854 SujeethJinesh opened 3 weeks ago
0
Fix kernel imports

#853 gobbleturk closed 3 weeks ago
0
Add node attributes to the training benchmark

#852 bernardhan33 closed 3 weeks ago
0
Fix kernel imports

#851 gobbleturk closed 2 weeks ago
1
Add node attributes; Fix GCS upload; Add checkpointID to checkpointing workload

#850 bernardhan33 closed 4 weeks ago
1
aqtp release 0.8.0 breaking dependencies

#849 bernardhan33 closed 4 weeks ago
1
documenting XLA flags used by MaxText

#848 nhira closed 2 weeks ago
1
mlperf gpt3 ckpt permission issues

#847 gramesh-amd closed 2 weeks ago
11
Add Llama2 config for v5p

#846 raymondzouu closed 3 weeks ago
0
Adding Mixtral-8x22b

#845 rdyro closed 2 weeks ago
2
How to load tfrecords from local file system for Mlperf training?

#844 gramesh-amd closed 4 weeks ago
3
Add Gemma2-27b

#843 ZhaoyueCheng closed 3 weeks ago
0
Optimize overhead right before the first train_step

#842 ZhiyuLi-goog closed 1 month ago
0
Add dispatch and combine masks for dropping

#841 RissyRan closed 3 weeks ago
1
Mlperf/4.1 grain

#840 aireenmei opened 1 month ago
1
[mlperf/4.1] enable shard_in_read for large scaling training

#839 ZhiyuLi-goog closed 1 month ago
1
Llama3.1 (8B,70B,405B) 🦙

#838 khatwanimohit opened 1 month ago
3
script to convert llama, mistral, mixtral checkpoints to huggingface format

#837 jwyang-google closed 1 week ago
2
[gcs-team] GCS Checkpointing benchmark feature updates

#836 MattIrv closed 1 month ago
1
Adds ragged attention.

#835 patemotter closed 4 weeks ago
0
Integrate Badput monitoring with MaxText

#834 dipannita08 closed 6 days ago
0
Add dropping strategy

#833 RissyRan closed 1 month ago
3
add kl divergence for forward_pass_logit_checker

#832 ZhaoyueCheng closed 1 month ago
1
Standalone checkpoint write seems to have memory leak

#831 bernardhan33 opened 1 month ago
1
Add support for local sliding window attention in TPU splash_attention

#830 gagika closed 1 month ago
0
converting Gemma maxtext compatible checkpoint to Hugging Face format

#829 salrowili opened 1 month ago
3
Report hyperparamters from the distributed training benchmark workload

#828 bernardhan33 closed 1 month ago
0
<Do not merge> Update and rename 1024b.sh to v5p-12288.sh

#827 Obliviour opened 1 month ago
0
Support AoT in 16-vm GPU Llama2 train script

#826 jonb377 closed 1 month ago
0
Removing the resgistration of the proxy backend used by Pathways.

#825 lukebaumann closed 1 month ago
0
Update NCCL flags for A3 Mega with the network release of 6/27.

#824 yangyuwei opened 1 month ago
0
[MLPerf][GPT3] Bypass setting eval_interval in using synthetic dataset

#823 ZhiyuLi-goog closed 1 month ago
0
Add instruction for Mixtral

#822 RissyRan closed 1 month ago
0
new features with distributed training framework

#821 bernardhan33 closed 1 month ago
0
chore: format the README table

#820 DemoYeti opened 1 month ago
0

Previous Next