issues
search
alpa-projects
/
alpa
Training and serving large-scale neural networks with auto parallelization.
https://alpa.ai
Apache License 2.0
3.08k
stars
360
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
[Simple] Remove deprecated numpy.int/float
#920
ymwangg
closed
1 year ago
0
Install llm_serving without having to clone the repo
#919
gianlucadetommaso
opened
1 year ago
0
Failed With Installation Check
#918
szxiangjn
opened
1 year ago
3
feat(compiler): implement last-seen heuristic for resharding source
#917
jon-chuang
opened
1 year ago
0
feat(build): Update alpa's build of XLA and PyTorch
#916
jon-chuang
opened
1 year ago
0
Allow users to specify a different cluster address when initializing Ray
#915
gjoliver
closed
1 year ago
1
Moved _compute_one_replica_ids from DistributedArray to PhysicalDeviceMesh
#914
akhmedsakip
closed
1 year ago
0
Specify Ray instance port
#913
akhmedsakip
closed
1 year ago
1
Fix CI docker build
#912
gjoliver
closed
1 year ago
0
Why do not use the latest `jaxlib` and `jax`?
#911
chaoming0625
closed
1 year ago
2
[FIX] clean code in runtime emitter
#910
ZYHowell
closed
1 year ago
0
Use shell script to drive jaxlib build
#909
yhtang
opened
1 year ago
0
Let NCCL ID store fate share with CollectiveGroup.
#908
gjoliver
closed
1 year ago
3
Rendezvous crashes while trying to access already killed NCCLUniqueIDStore
#907
gjoliver
closed
1 year ago
0
[Discussion]On the pillar of alpa's two-level hierarchical space of parallelism
#906
GHGmc2
opened
1 year ago
6
Fix DistributedArray.block_until_ready AttributeError
#905
chaokunyang
closed
1 year ago
0
AttributeError: 'DistributedArray' object has no attribute 'uuid'
#904
chaokunyang
closed
1 year ago
0
Rebase ALPA onto JAX v0.4.6
#903
yhtang
closed
1 year ago
1
[DOC] fix typo in resharding/README.md
#902
eltociear
closed
1 year ago
0
F external/org_tensorflow/tensorflow/compiler/xla/python/xla.cc:227] Check failed: stream_device != nullptr (0 vs. nullptr)
#901
chaokunyang
opened
1 year ago
1
Jax dispatch fail at `TypeError: No device_put handler ` when passing DistributedArray
#900
chaokunyang
opened
1 year ago
0
Create devcontainer.json
#899
Spina7demon
opened
1 year ago
0
Sync global_env.global_config across Ray cluster.
#898
gjoliver
closed
1 year ago
1
alpa.test_install error
#897
vectercyg
closed
1 year ago
3
error run model.init
#896
vectercyg
closed
1 year ago
1
Update setup.py to depend on latest Ray
#895
gjoliver
closed
1 year ago
0
Add example OPT-IML configs
#894
dlzou
closed
1 year ago
4
downloading weights automatically with an NFS
#893
pascalwhoop
opened
1 year ago
0
[Question] Benchmarking computation on each GPU
#892
jaywonchung
closed
1 year ago
3
[Question] Introspecting Alpa's sharding/parallelization decision
#891
jaywonchung
closed
1 year ago
3
Can you provide convert OPT-xx weights into Alpa formats?
#890
liguodongiot
closed
1 year ago
3
Acquiring sentence probability when serving on OPT.
#889
TZWwww
closed
1 year ago
1
Add path argument
#888
AetherPrior
closed
1 year ago
2
Add explaination on the Alpa output
#887
zhanyuanucb
closed
1 year ago
0
Alpa doesn't work with remote Ray cluster
#886
zhanyuanucb
opened
1 year ago
5
Update publications
#885
merrymercy
closed
1 year ago
0
UnboundLocalError: local variable 'state' referenced before assignment
#884
liguodongiot
opened
1 year ago
1
Invalid argument passed in nccl_all_reduce_thunk.cc to ncclReduceScatter and ncclAllReduce with bfloat16
#883
samblouir
opened
1 year ago
1
Encounter a 'no solution in auto stage construction' assertion error when running benchmarks with specified configurations.
#882
DicardoX
opened
1 year ago
2
Using a bfloat16 causes Double Free Exception and Crash
#881
samblouir
opened
1 year ago
2
Clip by Global Norm causes Pipeshard Parallel crash
#880
samblouir
opened
1 year ago
1
[DOC] Add guidance on strategy inspection
#879
merrymercy
closed
1 year ago
0
[FIX] explicit stage num with uniform stage divided by flops
#878
ZYHowell
closed
1 year ago
0
Question about the alpa serving
#877
lambda7xx
closed
1 year ago
8
[FIX] misc fix for t5x
#876
ZYHowell
closed
1 year ago
0
AssertionError when using PipeshardParallel for inference
#875
leiteg
closed
1 year ago
6
Does alpa-0.2.2 support auto search for serving like training?
#874
lambda7xx
closed
1 year ago
0
[FIX] Fix output sharding for create_state_parallel
#873
merrymercy
closed
1 year ago
0
will alpa-0.2.2 support inference auto search?
#872
lambda7xx
closed
1 year ago
0
Could you provide more info about MeshDriverDataLoader for clm fine-tune?
#871
dumpmemory
closed
1 year ago
0
Previous
Next