PFCCLab / Camp

飞桨护航计划集训营

18 stars 73 forks source link

[WeeklyReports] 2024.04.19~2024.05.10 周报汇总 #252

Closed AndSonder closed 4 months ago

AndSonder commented 6 months ago

[WeeklyReports] 2024.04.19~2024.05.10 周报汇总

请各位学员在本 issue 下以 comment 的形式填写周报摘要，ddl 本周五晚，格式示例如下：

### 姓名

xxx

### 本周工作

1. xxx
2. xxx

### 下周工作

1. xxx
2. xxx

### 详细周报链接：

- https://github.com/PFCCLab/Camp/pull/xxx

项目二：PIR 控制流专项

[x] @huangjiyi (@winter-wang)

项目三：PIR Python API 升级及机制建设

[x] @zrr1999 (@YuanRisheng, @0x45f)

项目四：动转静 SOT 模块 Python 3.12 支持

[x] @gouzil (@SigureMo)

项目五：算子支持复数计算专项

[x] @zyt1024 (@GGBond8488)
[x] @zbt78 (@GGBond8488)

项目六：模型迁移工具建设

[x] @RedContritio (@zhwesky2010)

项目七：框架 API 易用性提升

[x] @NKNaN (@zhwesky2010)

项目八：组合机制算子专项和机制建设

[x] @zeroRains (@cyber-pioneer)
[x] @Eddie-Wang1120 (@cyber-pioneer)

项目九：高阶微分的性能分析和优化

[x] @YibinLiu666 (@HydrogenSulfate)

项目十：静态图半自动并行训练性能优化

[x] @AndSonder (@From00)

项目十一：全自动并行架构升级

[ ] @Corle-hyz (@Caozhou1995)

项目十二：科学计算领域拓展专项（DeePMD-kit、光学案例）

[x] @xusuyong (@HydrogenSulfate)

项目十三：科学计算领域拓展专项（超分重构方向）

[x] @zhaojiameng (@lijialin03)

项目十四：科学计算领域拓展专项（领域流体方向）

[x] @DUCH714 (@wangguan1995)

项目十五：科学计算领域拓展专项（领域气象方向）

[x] @Yang-Changhui (@zhiminzhang0830)

项目十六：CINN 支持动态 Shape 专项（前端方向）

[x] @lshpku (@zyfncg, @jiahy0825)
[x] @ZelinMa557 (@zyfncg, @jiahy0825)

项目十八：CINN 静态 shape 下鲁棒性和性能优化

[x] @yulangz (@feifei-111, @2742195759)
[x] @WintersMontagne10335 (@feifei-111, @2742195759)

项目十九：CINN 支持动态 Shape 专项（后端规则升级）

[x] @Austin-00 (@BiynXu)

项目二十：CINN 支持动态 Shape 专项（后端模型扩量）

[x] @unseenme (@zhhsplendid)

项目二十一：Paddle CMake 治理和编译优化

[x] @silverling (@risemeup1)

项目二十二：PaddleMIX 套件能力建设（文图方向）

[x] @Tsaiyue (@westfish)

项目二十三：PaddleMIX 套件能力建设（图文方向）

[x] @cocoshe (@LokeZhou)

项目二十四：大模型训练稳定性和高效低价小模型快速收敛

[x] @lishuai-97 (@GuoxiaWang)

项目二十六：推理 Pass & 融合算子优化

[x] @yinfan98 (@yuanlehome)

项目二十七：PaddleSpeech 套件能力建设

[x] @kk-2000 (@zxcdlucy)
[x] @mattheliu (@zxcdlucy)

AndSonder commented 6 months ago

姓名

卢畅

实习项目

静态图半自动并行训练性能优化

本周工作

本周工作主要是编写 ZBV 编排代码，并在不同 memory limit 下进行测试以及将编排结果与官方实现对比。还对显存估计工具的问题进行了排查，并在不同配制下进行测试。

1. 解决Backward 阶段峰值显存估计偏差的问题

2. 在不同配置下估计显存

相关 PR:

https://github.com/PaddlePaddle/Paddle/pull/63402

3. 编写 ZBV 编排代码

初步完成 ZBV 编排的代码编写

相关 PR：

https://github.com/PaddlePaddle/Paddle/pull/63800

4. 编排结果与官方实现对比

下周工作

适配 vpp_degree > 2 的情况，完成 ZBV 编排代码的编写。修改分布式标记适配 ZBV 的 V 形编排。在 Llama2 上进行初步性能测试。

详细周报链接：

https://github.com/PFCCLab/Camp/pull/253

DUCH714 commented 6 months ago

姓名

本周工作

成功应用 Transolver 子模块代替GINO中的GNO模块，将压力的相对误差降低至 0.0260。

下周工作

尝试应用DrivAerNet

详细周报链接：

https://github.com/PFCCLab/Camp/pull/254

yulangz commented 6 months ago

姓名

Github ID: yulangz

实习项目

CINN子图鲁棒性和性能优化

本周工作

CINN 故障修复
1. 添加 DistributeFpnProposalsOpInferSymbolicShape、RoiAlignOpInferSymbolicShape，支持 FPN 模型子图的符号推导。PR: https://github.com/PaddlePaddle/Paddle/pull/63947
2. 基于 gather 算子理解 CINN 中从符号推导到代码生成的全部流程。

下周工作

从 gather 算子出发，理解 CINN 的整体流程。

详细周报链接

https://github.com/PFCCLab/Camp/pull/256

zhaojiameng commented 6 months ago

姓名

赵加孟

本周工作

训练代码跑通
paddlescience代码反向精度对齐

下周工作

残差损失精度对齐
整理仓库，提交PR

详细周报链接：

https://github.com/PFCCLab/Camp/pull/257

mattheliu commented 6 months ago

姓名

刘卓鑫

实习项目

PaddleSpeech 套件能力建设

本周工作

跑PaddleSpeech/Demos

Name	运行结果	截图
TTSAndroid
TTSArmLinux
TTSCppFrontend
asr_deployment	source path.sh执行报错Error: 'build/speechx' directory not found. please ensure that the project build successfully
audio_content_search	报错[2024-05-06 22:41:47,376] [ INFO] - acs http client start [2024-05-06 22:41:47,377] [ INFO] - endpoint: http://127.0.0.1:8090/paddlespeech/asr/search [2024-05-06 22:41:47,382] [ ERROR] - Failed to speech recognition. [2024-05-06 22:41:47,383] [ ERROR] - HTTPConnectionPool(host='127.0.0.1', port=8090): Max retries exceeded with url: /paddlespeech/asr/search (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fbaf10e6830>: Failed to establish a new connection: [Errno 111] Connection refused')) $ python test.py [2024-05-06 22:48:23,017] [ INFO] - acs http client start [2024-05-06 22:48:23,017] [ INFO] - endpoint: http://127.0.0.1:8490/paddlespeech/asr/search Traceback (most recent call last): File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/urllib3/connection.py", line 200, in _new_conn sock = connection.create_connection( File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/urllib3/util/connection.py", line 85, in create_connection raise err File "/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/urllib3/util/connection.py", line 73, in create_connection sock.connect(sa) ConnectionRefusedError: [Errno 111] Connection refused The above exception was the direct cause of the following exception:
audio_searching	需要docker
audio_tagging	成功
automatic_video_subtitiles	成功
custom_streaming_asr	需要docker
keyword_spotting	成功
metaverse	报错
punctuation_restoration	成功
speaker_verification	成功
speech_recognition	成功
speech_server	报错
speech_ssl	# 识别文本 text = ssl_executor( model='wav2vec2,确实右单引号，命令行成功，Python API识别文本成功，得到声学表征失败，报错ASR Result: i knocked at the door on the ancient side of the building [2024-05-03 23:56:05,298] [ ERROR] - forward() missing 3 required positional arguments: 'wavs_lens_rate', 'target', and 'target_lens' Traceback (most recent call last): File "/home/aistudio/external-libraries/tools/venv/lib/python3.8/site-packages/paddlespeech/cli/ssl/infer.py", line 290, in infer out_feature = self.model(audio[:, :, 0]) File "/home/aistudio/external-libraries/tools/venv/lib/python3.8/site-packages/paddle/nn/layer/layers.py", line 1426, in call return self.forward(inputs, *kwargs) TypeError: forward() missing 3 required positional arguments: 'wavs_lens_rate', 'target', and 'target_lens' Representation: i knocked at the door on the ancient side of the building
speech_translation	成功
speech_web	成功
story_talker	报错
streaming_asr_server	成功
streaming_tts_server
streaming_tts_serving_fastdeploy	需要docker
style_fs2	报错
text_to_speech	报错
whisper	报错PaddlePaddle version 2.3.0 or higher is required, but 0.0.0 installed, Maybe you are using a develop version, please make sure the version is good with your code.

下周工作

继续跑PaddleSpeech/Demos

详细周报链接：

https://github.com/PFCCLab/Camp/pull/258

zeroRains commented 6 months ago

姓名

卢林军

本周工作

对reduce_as op 实现complex64/128和int8的支持

https://github.com/PaddlePaddle/Paddle/pull/63782

补充reduce_as op的中文文档

https://github.com/PaddlePaddle/docs/pull/6621

reduce_as op的反向拆解

https://github.com/PaddlePaddle/Paddle/pull/64016

下周工作

探索sigmoid_cross_entropy_with_logits op中，pos_weight参数不为全1 Tensor时，拆解的反向计算和kernel的反向计算结果不一致的原因
修改reduce_as的中文文档
完善之前尚未merge的PR。

详细周报链接：

https://github.com/PFCCLab/Camp/pull/259

lshpku commented 6 months ago

姓名

梁书豪

本周工作

针对CUDA warp访存连续性提出一个新的Tile流程，并使用TVM验证了正确性

下周工作

在CINN后端TileFirstGeneralTactic中实现新的Tile流程

详细周报链接：

https://github.com/PFCCLab/Camp/pull/255

Yang-Changhui commented 6 months ago

姓名

Yang-Changhui

本周工作

完成earthformer技术文档的编写
实现tfno、uno、sfno网络的训练、验证、推理可视化功能

下周工作

根据导师意见，继续完善earthformer的技术文档
根据导师意见，修改neuraloperator模型的代码，以及相关文档的编写

相关pr:

https://github.com/PaddlePaddle/PaddleScience/pull/867
https://github.com/PaddlePaddle/PaddleScience/pull/877

详细周报链接：
https://github.com/PFCCLab/Camp/pull/262

huangjiyi commented 6 months ago

姓名

黄济懿

实习项目

PIR 控制流专项

本周工作

分析 PaddleDetection 中 ppyoloe_plus_crn_l_80e_coco 模型训练在 PIR 下出现的显存泄露问题并初步解决
- https://github.com/PaddlePaddle/Paddle/pull/64130

下周工作

为 yield op 实现 instruction

详细周报链接：

https://github.com/PFCCLab/Camp/pull/264

gouzil commented 6 months ago

姓名

田川

本周工作

升级pybind11
解决了遗留已久的反向 Program 析构问题

下周工作

PyFuncOp 迁移

详细周报链接：

263

NKNaN commented 6 months ago

姓名

李睿文

本周工作

针对 paddle.nn.functional.group_norm 和 paddle.nn.GroupNorm 的功能增强
分析 paddle.distribution.Categorical 与 torch.distributions.Categorical 不一致的地方
针对 paddle.nn.initializer.TruncatedNormal 的功能增强

下周工作

计划修改 paddle.nn.Layer 中的 stat_dict 方法：增加参数 keep_vars
计划修改 paddle.io.BatchSampler：升级 sampler 参数，支持任意可迭代类型
升级 paddle.add/sub/div/mul 等二元 API 以支持 python number 的任务，需进一步参考类型提升进行修改

详细周报链接：

https://github.com/PFCCLab/Camp/pull/265

Eddie-Wang1120 commented 6 months ago

姓名

王晋恒

实习项目

组合机制算子专项和机制建设

本周工作

学习动态shape相关知识
- 了解算子拆解中动态shape的相关知识
进行算子拆解工作
- support hardsigmoid op forward in prim pir 已合入
- support floor_divide op forward in prim pir 已合入
- support bce_loss op forward in prim pir 已合入
- support bmm op forward in prim pir 已合入
- support masked_select op backward in prim pir 已完成
- support clip op forward in prim pir 开发中
- support dynamic one_hot op forward in prim pir 开发中

下周工作

继续进行算子拆解工作

详细周报链接

https://github.com/PFCCLab/Camp/pull/266

CJ77Qi commented 6 months ago

姓名

Github-ID: Austin-00

实习项目

CINN 支持动态 Shape 专项（后端规则升级）

本周工作

理清CINN LowertoAST 以及 Schedule 的整个pipeline过程
理清dy_shape_group_scheduler、tile_config、tile_first_general_tactic的动态Shape Scheduleconfig以及代码骨架调整逻辑，设计后续ScheduleConfig
了解并确定后续Searcher开发需求和大致技术路线

下周工作

对后续Searcher的接口进行补全，并继续迭代开发，完成PR合并
根据需求动态调整，进一步优化性能

详细周报链接

https://github.com/PFCCLab/Camp/pull/267

yinfan98 commented 6 months ago

姓名

尹帆

本周工作

kernel精度对齐
重新实现int4 weight only kernel https://github.com/PaddlePaddle/Paddle/pull/64094

下周工作

Llama3 RoPE kernel
Llama3 GQA kernel

详细周报链接：

https://github.com/PFCCLab/Camp/pull/268

unseenme commented 6 months ago

姓名

Github ID: unseenme

本周工作

离线测试流程代码改进
离线测试与筛选
将有价值的用例提交到Paddle仓库

下周工作

继续离线测试
继续改进测试流程代码以进一步提高测试效率

详细周报链接：

https://github.com/PFCCLab/Camp/pull/269

silverling commented 6 months ago

姓名

凌少鹏

本周工作

对 “分离 PHI 模块” 工作进行收尾：
- https://github.com/PaddlePaddle/Paddle/pull/63712

下周工作

完善模块抽离工作等相关测试，确保影响面尽可能的小

详细周报链接：

https://github.com/PFCCLab/Camp/pull/270

YibinLiu666 commented 6 months ago

姓名

YibinLiu666

本周工作

升级cumprod的功能，支持reverse与exclusive两个参数 https://github.com/PaddlePaddle/Paddle/pull/64022
使用双向cumprod修复 prod_grad 在x有0的时候x_grad出现nan的bug，https://github.com/PaddlePaddle/Paddle/pull/64127

下周工作

收尾cumprod升级pr。
修复双向cumprod实现prod_grad的bug
然后支持bmm复数complex类型

详细周报链接

https://github.com/PFCCLab/Camp/pull/271

lishuai-97 commented 6 months ago

姓名

李帅

Github ID: lishuai-97

本周工作

本周主要的工作为完善实验，构建XXX梯度裁剪算法论文框架，撰写完成论文初稿

下周工作

继续完善、润色XXX梯度裁剪算法论文，准备投稿

详细周报链接：

https://github.com/PFCCLab/Camp/pull/272

ZelinMa557 commented 6 months ago

姓名

马欣楷

本周工作

与导师沟通并敲定reduce mean优化方案，实现了generate shape算子的中端代码生成
修复部分inferSymbolic与inferMeta结果不一致的问题

下周工作

收尾上述工作
开始对if/else等控制流在计算图中与在cuda kernel中的性能差异进行测试

详细周报链接：

https://github.com/PFCCLab/Camp/pull/261

zbt78 commented 6 months ago

姓名

周波涛

实习项目

算子支持复数计算专项

本周工作

继续解决上周中pow的问题，当算子的attr作为Scalar时仍然没有较好的解决方法。另一部分是在powfunctor中求梯度时，采用的是把输入数据的实部和虚部拆分开两部分来计算，但functor中的参数为eigen类型的数据，在测试时没有问题，同样ci中出现较多错误，继续解决。
调研黑客松题目

下周工作

继续解决pow中出现的问题
修复其他当前已提交pr，收尾这部分

详细周报链接：

https://github.com/PFCCLab/Camp/pull/273

cocoshe commented 6 months ago

姓名

叶柯

实习项目

PaddleMIX 套件能力建设（图文方向）

本周工作

本周主要工作如下：

帮助启航计划的同学解决一些遇到的问题

推进 internlm-xcomposer2 集成到 PaddleMIX 中的PR合入

https://github.com/PaddlePaddle/PaddleMIX/pull/511

下周工作

赶一下毕设和最近nips的ddl

详细周报链接：

https://github.com/PFCCLab/Camp/pull/274

xusuyong commented 6 months ago

姓名

xusuyong

本周工作

调研文献，理解使用NTK理论调节loss权重的算法
基于allen_cahn_default案例添加通用的ntk优化策略

下周工作

修正ntk weight的PR
详细周报链接：

https://github.com/PFCCLab/Camp/pull/275

kk-2000 commented 6 months ago

姓名

张千芊

本周工作

基于paddlepaddle-gpu==2.6.1及paddleSpeech==develop对demos中14个demo（共25个）进行了测试，并记录测试情况

下周工作

继续完成PaddleSpeech测试，搭建docker环境

详细周报链接：

https://github.com/PFCCLab/Camp/pull/260

Tsaiyue commented 6 months ago

姓名

蔡越

实习项目

PaddleMIX 套件能力建设（文图方向）

本周工作

添加Open-Sora前向推理支持，包含文生视频及其他高级玩法；
- 相关PR：https://github.com/PaddlePaddle/PaddleMIX/pull/537

下周工作

推进Open-Sora前向推理PR合入；
开展Open-Sora训练对齐工作。

详细周报链接

https://github.com/PFCCLab/Camp/pull/276

zrr1999 commented 6 months ago

姓名

詹荣瑞

实习项目

PIR Python API 升级及机制建设

本周工作

完成 PaddleSOT 的动态 shape 支持的部分准备工作

相关 PR：

下周工作

继续完善 PaddleSOT 的动态 shape 支持

详细周报链接

https://github.com/PFCCLab/Camp/pull/277

zyt1024 commented 6 months ago

姓名

张玉涛

实习项目

算子支持复数计算专项

本周工作

继续完善fill_constant_op算子，当属性支持Scalar后，跑CI时会出现较多问题，已解决好PR-CE-Framework等问题，仍有一部分CI有问题。

下周工作

继续修复fill_constant_op实现上的问题
详细周报链接

https://github.com/PFCCLab/Camp/pull/278

WintersMontagne10335 commented 6 months ago

姓名

马贺达

本周工作

添加 cinn_op.reshape 到 pd_op.reshape 的转换规则（相关PR：https://github.com/PaddlePaddle/Paddle/pull/64303 ）
将 "pd_op.add" 添加至 ALLOW_DYNAMIC_SHAPE_VJP_OPS；补充全局推导时带有外部输入时的处理逻辑（相关PR：https://github.com/PaddlePaddle/Paddle/pull/64342 ）
以《浅析 cinn 中的符号推导机制》为题做代码串讲（相关链接：https://github.com/WintersMontagne10335/Paddle-Code-Camp/blob/master/code%20reading/%E6%B5%85%E6%9E%90%20cinn%20%E4%B8%AD%E7%9A%84%E7%AC%A6%E5%8F%B7%E6%8E%A8%E5%AF%BC%E6%9C%BA%E5%88%B6.md ）

下周工作

继续修 bug
完善《浅析 cinn 中的符号推导机制》
VerticalLoopFusion 阅读笔记

详细周报链接：

https://github.com/PFCCLab/Camp/pull/279

Corle-hyz commented 6 months ago

姓名

何咏哲

本周工作

探究异构集群的性质
初步构建一个异构体系下的全自动并行方案

下周工作

借助显存公式完成动态均衡的拆分算法，完善异构体系下的全自动并行方案，将其参数化。

详细周报链接：

https://github.com/PFCCLab/Camp/pull/295