dyweb / papers-notebook

:page_facing_up: :cn: :page_with_curl: 论文阅读笔记(分布式系统、虚拟化、机器学习)Papers Notebook (Distributed System, Virtualization, Machine Learning)
https://github.com/dyweb/papers-notebook/issues?utf8=%E2%9C%93&q=is%3Aissue+is%3Aopen+-label%3ATODO-%E6%9C%AA%E8%AF%BB
Apache License 2.0
2.12k stars 244 forks source link

An Empirical Study on Program Failures of Deep Learning Jobs #210

Open gaocegege opened 4 years ago

gaocegege commented 4 years ago

http://hongyujohn.github.io/icse20-main-199.pdf

来源:谷歌学术 alert,wencong xiao 学长的新文章

gaocegege commented 4 years ago

严格来说,这是一篇软件工程方向的文章,与深度学习无关。文章分析了在微软自己的 GPU 集群上遇到的各种错误,总结了错误类型,并且对它们进行了分析。

Screenshot from 2020-04-27 11-24-28