将拉勾中爬下来的所有数据都存入数据库。

JustForFunnnn / webspider

A website of IT position data & analysis, helps you to get a better understanding of the requirements and trends of the IT job market

MIT License

370 stars 130 forks source link

将拉勾中爬下来的所有数据都存入数据库。 #14

Closed arjenzhou closed 5 years ago

arjenzhou commented 5 years ago

想要将拉勾中爬下来的所有数据都存入数据库应该使用哪条命令？

# 启动定时任务分发器
env/bin/celery_beat

是定期启动所有爬虫吗？

fishkao commented 5 years ago

提个新需求：提取某一种职业要求

对某一个行业职位要求的标签提取，能计算出每年对某一种职位的的标签的变迁

长期霸版的为基础技能标签
近2年新增的标签

Title的类型的分布，比如同样的产品经理，又分了前端、后端、B端、C端、商业、增长等，输出一个title的分布

分析出产品经理的通用职能需求
每一种产品经理的偏向性技术要求标签

JustForFunnnn commented 5 years ago

现在没有时间~~~

yilan notifications@github.com 于2019年3月4日周一下午9:22写道：

提个新需求：提取某一种职业要求

对某一个行业职位要求的标签提取，能计算出每年对某一种职位的的标签的变迁

长期霸版的为基础技能标签

近2年新增的标签

Title的类型的分布，比如同样的产品经理，又分了前端、后端、B端、C端、商业、增长等，输出一个title的分布

分析出产品经理的通用职能需求

每一种产品经理的偏向性技术要求标签

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/GuozhuHe/webspider/issues/14#issuecomment-469251103, or mute the thread https://github.com/notifications/unsubscribe-auth/AM2tHLf9a3HSAk6JX1NUl7Kn2xt9XkVCks5vTR4tgaJpZM4aUCDd .

JustForFunnnn commented 5 years ago

是的这个是celery的定时任务还需要开worker配合

Yang Zhou notifications@github.com 于2019年1月26日周六下午2:58写道：

想要将拉勾中爬下来的所有数据都存入数据库应该使用哪条命令？

启动定时任务分发器

env/bin/celery_beat

是定期启动所有爬虫吗？

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/GuozhuHe/webspider/issues/14, or mute the thread https://github.com/notifications/unsubscribe-auth/AM2tHECvS63wR_OXfWM8nqe2bHOFT9oaks5vG_yggaJpZM4aUCDd .