predict VIDEO - Githubissues

fzd9752 commented 6 years ago

目的：？？（不知道和公司发展有什么关系，也不知道做出来能干什么……）

zdx：序列预测是智能非常重要的能力，对于AI非常重要，完全符合公司目标通用智能，做出了能增强现有神经网络的智能。具体场景：大家一起想！避障，其他车辆意图的预测，torcs游戏验证？机器人自己动作的预测，常识学习。原型验证ok，完善中再继续找应用的场景和产品的具体完善。

目标：搭建一个视频生成网络要求：pix2pix 框架，基于GAN技术

注：以上为主观因素

基本结构：

G：简易 3D_UNET 网络，初步大小64 x 64，目标大小 128 x 128 D：C3D 类似结构判别器

效果：输入10帧视频，输出5帧视频

预计时间：总用时 8 周

网络基础搭建 4 周：
- [ ] W1: 论文清单论文和相关代码
- [ ] W2 - W3: 简单主体结构G D搭建
- [ ] W4: 试训练，看能否收敛
网络调试：2周确认网络有潜力后进一步增加复杂度
- [ ] W5: 扩增网络
- [ ] W6: 大数据及测试，pipeline顺畅
Demo 训练 + 测试: 2周
- [ ] W7: 训练，调bug
- [ ] W8: 测试目标数据集

15 号最新更新: 按张总的意思，换 Pytorch 框架，基于 pix2pix 原始代码修改修改。参考如下：

pix2pix pytorch 源代码： https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix

一些 pytorch 3D 应用的实例： https://github.com/shiba24/3d-unet https://github.com/kenshohara/video-classification-3d-cnn-pytorch https://github.com/kenshohara/3D-ResNets-PyTorch

pytorch 官方 Document： http://pytorch.org/docs/master/nn.html 关键 operatoin： 3D deconvolution - torch.nn.ConvTranspose3d 3D convolution - torch.nn.Conv3d 3D maxpooling - torch.nn.MaxPool3d 3D dropout - torch.nn.Dropout3d

Keras 实现 已取消

计划注意：

计划列出的是最低时间，因为进度原因可能推迟

可能失败原因

[ ] 1. 因为现有目标数据集不符合pix2pix coniditional gan 分布的原理，生成图像可能无法毫无价值
[ ] 2. 3D convolution 耗费内存增大，最终模型以我们现有条件可能跑不起来
[ ] 3. 技术能力不足，耦合失败
[ ] 4. 公司调整方向，放弃

fzd9752 commented 6 years ago

必读论文及相关要求：

Image-to-Image Translation with Conditional Adversarial Nets conditional GAN 基础框架 https://arxiv.org/pdf/1611.07004v1.pdf https://github.com/costapt/vess2ret https://github.com/createamind/pytorch-CycleGAN-and-pix2pix
Learning Spatiotemporal Features with 3D Convoutional Networks 3D 卷积 1412.0767 https://gist.github.com/albertomontesg/d8b21a179c1e6cca0480ebdf292c34d2 https://github.com/harvitronix/five-video-classification-methods/blob/master/models.py 我们将使用流行的UCF101数据集。我发现这个数据集在课堂和培训数据方面有很好的平衡，还有很多我们自己判断自己反对的有据可查的基准。与一些较新的视频数据集（请参阅YouTube-8M）不同，现代系统上的数据量是可管理的。 UCF很好地总结了他们的数据集： UCF101在101个动作类别中提供了13,320个视频，在动作方面表现出最大的多样性，相机运动，物体外观和姿态，物体尺寸，视点，杂乱的背景，照明条件等等都存在很大的变化。具有挑战性的数据集迄今。
Generating Videos with Scene Dynamics VideoGAN，3D卷积视频生成 1609.02612
U-Net: Convolutional Networks for Biomedical Image Segmentation 基础网络结构 1505.04597
3D U-net: Learning dense volumetric segmentation from sparse annotation 3D 化 UNET，代码参考 https://github.com/ellisdg/3DUnetCNN/blob/master/unet3d/model.py

zdx6 mocogan https://github.com/sergeytulyakov/mocogan 这个训练的硬件满足。 https://github.com/akuzeee/MoCoGAN/blob/master/models.py

7 Dual Motion GAN for Future-Flow Embedded Video Prediction