issues
search
GoogleCloudPlatform
/
llm-pipeline-examples
Apache License 2.0
107
stars
26
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Stuck at creating H100 instances
#87
jiangwei221
opened
2 months ago
1
Update README.md
#86
dumb-programmer
closed
7 months ago
1
Add Megatron-Deepseed GPT 176B instructions
#85
abdallag
closed
7 months ago
0
Megatron updates
#84
abdallag
closed
8 months ago
0
Add preprocessing step
#83
abdallag
closed
8 months ago
0
Deployment container and container centric README
#82
abdallag
closed
8 months ago
0
Add space
#81
abdallag
closed
9 months ago
0
Fix triton predicttions
#80
abdallag
closed
9 months ago
0
Add cluster clean up instructions
#79
abdallag
closed
9 months ago
0
Increase training timeout
#78
abdallag
closed
9 months ago
0
Update README.md
#77
abdallag
closed
9 months ago
0
Minor syntax changes to intro part of Readme
#76
sdlin
closed
9 months ago
0
Update cluster provisioning tool
#75
abdallag
closed
9 months ago
0
Update documentation to make A3 H100 training the default
#74
abdallag
closed
9 months ago
0
Configure Renovate
#73
renovate-bot
closed
9 months ago
1
Disable CPU offloading
#72
abdallag
closed
9 months ago
0
A3 config
#71
abdallag
closed
10 months ago
0
Pass project paramter to gcloud
#70
abdallag
closed
10 months ago
0
Unable to reuse previously provisioned cluster
#69
gkcng
opened
10 months ago
0
Support A3 cluster provisioning
#68
abdallag
closed
10 months ago
0
Fix triton test results
#67
abdallag
closed
10 months ago
0
TCPX support
#66
abdallag
closed
10 months ago
0
Support for AutoModelForCausalLM
#65
sagar-deepscribe
opened
11 months ago
0
Add autoclass and causal lm class to predict.py
#64
Chris113113
closed
11 months ago
0
Add 'latest' image push to nightly build
#63
Chris113113
closed
12 months ago
0
Swap region for test
#62
Chris113113
closed
12 months ago
1
Fix readme, Triton train version, training quota
#61
Chris113113
closed
12 months ago
0
Trainer parameters seem wrong.
#60
Keloo
closed
12 months ago
1
Remove deepspeed from default inferencing image
#59
Chris113113
closed
12 months ago
0
Migrate default Inferencing container to Transformers from DeepSpeed+Transformers
#58
Chris113113
closed
12 months ago
2
Benchmark script with k8s yaml
#57
Chris113113
opened
1 year ago
1
Add --quiet flag to gke cleanup
#56
Chris113113
closed
1 year ago
0
Add GKE Inferencing quickstart guide, and clarify some docs
#55
Chris113113
closed
1 year ago
0
Add test for GKE cluster provision -> convert -> deploy
#54
Chris113113
closed
1 year ago
0
Add test for GKE cluster provision -> convert -> deploy
#53
Chris113113
closed
1 year ago
0
Add test for GKE cluster provision -> convert -> deploy
#52
Chris113113
closed
1 year ago
0
Update DL container
#51
abdallag
closed
10 months ago
0
Fix cluster and bucket creation for GKE
#50
Chris113113
closed
1 year ago
0
Fix pipeline to be compatible with new kfp
#49
abdallag
closed
1 year ago
0
Deployments fail on GKE with kubernetes images > 1.26.3
#48
Chris113113
opened
1 year ago
0
update provisioning tool
#47
stevenBorisko
closed
10 months ago
3
Allow launching in existing clusters from batch container
#46
abdallag
closed
1 year ago
0
Allow cluster reuse from batch container
#45
abdallag
closed
1 year ago
0
Fix training logs collection
#44
abdallag
closed
1 year ago
0
GKE cluster creation fails
#43
abdallag
closed
1 year ago
1
GKE inference fails with bucket creation error
#42
abdallag
closed
1 year ago
2
GKE Inference instructions improvements
#41
abdallag
closed
1 year ago
1
Refactor training and cluster configurations
#40
abdallag
closed
1 year ago
0
Fix instructions and yml for gcloud run
#39
Chris113113
closed
1 year ago
1
Instructions doesn't mention gcloud run deploy interaction
#38
abdallag
closed
1 year ago
0
Next