issues
search
CentaurusInfra
/
alnair
Intelligent platform for AI workloads
Apache License 2.0
37
stars
12
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Add a new ddp training script
#150
YHDING23
closed
1 year ago
0
GDS traffic monitor
#147
pint1022
opened
2 years ago
0
Major updates to the README files - restructured and separated them, …
#146
np-ftrwei
closed
2 years ago
0
fix notebook errors
#145
YHDING23
closed
2 years ago
0
rewrite pod metadata and utils data collection and storing
#144
Fizzbb
closed
2 years ago
0
[alnair device plugin] feature request -- support GPU selection
#143
Fizzbb
opened
2 years ago
0
Create nerf_ddp.py
#142
nwangfw
closed
2 years ago
0
Add Neural Avatar as use-case
#141
YHDING23
closed
2 years ago
0
remove log.Fatalf from exiting programs
#140
Fizzbb
closed
2 years ago
0
Alluxio data orchestration
#139
np-ftrwei
closed
2 years ago
0
Update mnist-distributed.py
#138
nwangfw
closed
2 years ago
0
Intercept hook
#137
pint1022
closed
2 years ago
0
GPUDirect to local SSD
#136
Fizzbb
opened
2 years ago
0
removed a space; changed memory size type long long
#135
pint1022
closed
2 years ago
0
Exporter dev
#134
Fizzbb
closed
2 years ago
0
Cuda met
#133
pint1022
closed
2 years ago
0
intercept-lib test instruction doesn't work.
#132
awang088
opened
2 years ago
1
Add prometheus export to report process-level GPU utilization and memory used size
#131
Fizzbb
opened
2 years ago
0
vgpu-server get cgroup pid from docker top instead of copy file, and …
#130
Fizzbb
closed
2 years ago
0
fix bug convert timestamp to float unexpected
#129
Fizzbb
closed
2 years ago
0
scheduling needs
#128
Fizzbb
opened
2 years ago
0
A bad case for dlsym real func acquirement.
#127
CalvinXKY
closed
2 years ago
3
Add IsSharingGPU function
#126
YHDING23
closed
2 years ago
0
vGPU scheduler assume all the nodes have GPU information annotation. Cannot handle cpu node or the period before annotation got patched
#125
Fizzbb
opened
2 years ago
1
remove potential .so directory in /opt/alnair to avoid Init:crashloop…
#124
Fizzbb
closed
2 years ago
0
Containerize vGPU server leads cgroup.procs content invisible (leads to process util inquiry always 0, compute control failed)
#123
Fizzbb
closed
2 years ago
4
device-plugin installation error, Init:crashloopback
#122
Fizzbb
opened
2 years ago
1
Add binpack and spread policy
#121
YHDING23
closed
2 years ago
0
change alnair socket path, so it does not need to mount /run causing …
#120
Fizzbb
closed
2 years ago
0
vgpu-server container failed to start, "run/nvidia-persistenced/socket" no such device or address
#119
Fizzbb
opened
2 years ago
0
add max memory bandwidth utils to pod metrics
#118
Fizzbb
closed
2 years ago
0
comment out remove annotations
#116
Fizzbb
closed
2 years ago
0
profiler add mem-copy-utils from DCGM to reflect application's io requests
#115
Fizzbb
closed
2 years ago
1
intercept lib launched through LD_PRELOAD cannot intercept cuda driver API calls with pytorch version >=1.10
#114
Fizzbb
opened
2 years ago
1
profiler remove all pod annotation under ai.centaurus.io domain after gpu process is done, which affects scheduler and device plugin
#113
Fizzbb
closed
2 years ago
1
use nsight system inside containers
#112
Fizzbb
opened
2 years ago
1
update single file deployment, mount /run, require no nvidia-docker2 …
#111
Fizzbb
closed
2 years ago
0
Add pre-start hook to all containers in container runtime to support GPU access
#110
Fizzbb
opened
2 years ago
0
same node pods communication through unix socket
#109
Fizzbb
opened
2 years ago
2
create an exporter to export burst, overuse and window-size metrics to prometheus.
#108
pint1022
opened
2 years ago
0
setup multiple nodes cluster for kubeshare performance testing
#107
pint1022
opened
2 years ago
1
setup tf-serving testing environment for kubeshare throughput testing
#106
pint1022
opened
2 years ago
0
horovod mnist.py has higher utilization number. what does it do?
#105
pint1022
opened
2 years ago
0
Add the vGPUScheduler to support Alnair Virtual GPUs
#104
YHDING23
closed
2 years ago
0
Revert "Add the vGPUScheduler to support Alnair Virtual GPUs"
#103
YHDING23
closed
2 years ago
0
Add the vGPUScheduler to support Alnair Virtual GPUs
#102
YHDING23
closed
2 years ago
0
Revert "Add the vGPUScheduler to support alnair virtual gpu"
#101
YHDING23
closed
2 years ago
0
Add the vGPUScheduler to support alnair virtual gpu
#100
YHDING23
closed
2 years ago
0
modify getPreferredDeviceIDs function to make sure vGPU IDs are all f…
#99
Fizzbb
closed
2 years ago
1
GPU sharing corner case: vGPUs spread to two or more physical GPUs
#98
Fizzbb
opened
2 years ago
0
Next