我想换个城市的爬取怎么弄

fangxuanhao commented 5 years ago

大佬，你的这个代码我要换个城市，的爬取，或者爬全国的，然后岗位也是爬全部的岗位我该如何处理呢修改那部分

Chauncey2 commented 5 years ago

在拼接URL的字段中，好像有参数是用来选择不同参数的

Chauncey2 commented 5 years ago

你选择城市然后点查询，然后观察后台发送的api路径，对比一下城市的不同，我记得是有一个参数是用来选择城市的。

fangxuanhao commented 5 years ago

thank you

------------------ 原始邮件 ------------------ 发件人: "Chauncey2"notifications@github.com; 发送时间: 2019年8月2日(星期五) 中午1:30 收件人: "Chauncey2/zhaopin_spider"zhaopin_spider@noreply.github.com; 抄送: "麋鹿麋鹿不迷路"3241644639@qq.com;"Author"author@noreply.github.com; 主题: Re: [Chauncey2/zhaopin_spider] 我想换个城市的爬取怎么弄 (#1)

你选择城市然后点查询，然后观察后台发送的api路径，对比一下城市的不同，我记得是有一个参数是用来选择城市的。

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

Chauncey2 commented 5 years ago

关于爬取全部职位的，我的这个爬虫应该就是获取zhilian上的全部职位的数据吧。我是先获取首页导航栏的职位类型和对应的关键字，然后存储为json文件，最后另一个爬虫获取文件中的关键字内容，拼接url发送请求，获取数据。

fangxuanhao commented 5 years ago

对了我想爬他的二级页面就是能获取到他招聘人数的这个字段的话是不是要找到他的二级页面的链接地址

------------------ 原始邮件 ------------------ 发件人: "Chauncey2"notifications@github.com; 发送时间: 2019年8月2日(星期五) 中午1:33 收件人: "Chauncey2/zhaopin_spider"zhaopin_spider@noreply.github.com; 抄送: "麋鹿麋鹿不迷路"3241644639@qq.com;"Author"author@noreply.github.com; 主题: Re: [Chauncey2/zhaopin_spider] 我想换个城市的爬取怎么弄 (#1)

关于爬取全部职位的，我的这个爬虫应该就是获取zhilian上的全部职位的数据吧。我是先获取首页导航栏的职位类型和对应的关键字，然后存储为json文件，最后另一个爬虫获取文件中的关键字内容，拼接url发送请求，获取数据。

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

Chauncey2 commented 5 years ago

Scrapy框架写爬虫的逻辑都是类似的，晚上有很多教程和数据，兄台可以参考下，这个爬虫也是我上学的时候写的，不太成熟。

fangxuanhao commented 5 years ago

嗯嗯还是谢谢了

------------------ 原始邮件 ------------------ 发件人: "Chauncey2"notifications@github.com; 发送时间: 2019年8月2日(星期五) 中午1:40 收件人: "Chauncey2/zhaopin_spider"zhaopin_spider@noreply.github.com; 抄送: "麋鹿麋鹿不迷路"3241644639@qq.com; "Author"author@noreply.github.com; 主题: Re: [Chauncey2/zhaopin_spider] 我想换个城市的爬取怎么弄 (#1)

Scrapy框架写爬虫的逻辑都是类似的，晚上有很多教程和数据，兄台可以参考下，这个爬虫也是我上学的时候写的，不太成熟。

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

Chauncey2 commented 5 years ago

是的，你想获得二级页面，那就得获得详情页的链接，然后发送请求，在爬虫中编写详情页的解析函数。

Chauncey2 commented 5 years ago

不过如果是一页一页获取，那爬取速度会很慢，体现不了爬虫的优势。你可以考虑引入Redis数据库做url缓存，同时可以开发分布式爬虫提高爬取效率。

fangxuanhao commented 5 years ago

嗯嗯，看您的代码算是入了个简单的门吧

------------------ 原始邮件 ------------------ 发件人: "Chauncey2"notifications@github.com; 发送时间: 2019年8月2日(星期五) 中午1:44 收件人: "Chauncey2/zhaopin_spider"zhaopin_spider@noreply.github.com; 抄送: "麋鹿麋鹿不迷路"3241644639@qq.com; "Author"author@noreply.github.com; 主题: Re: [Chauncey2/zhaopin_spider] 我想换个城市的爬取怎么弄 (#1)

不过如果是一页一页获取，那爬取速度会很慢，体现不了爬虫的优势。你可以考虑引入Redis数据库做url缓存，同时可以开发分布式爬虫提高爬取效率。

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

Chauncey2 commented 5 years ago

当不起当不起，兄台还是别用敬语了，我也是刚毕业，还在学习中，有问题我们可以交流，我在学习爬虫的时候看了一本书《Python3 网络爬虫开发实战》崔庆才主编的。我觉得挺不错，你如果感兴趣可以参考下，Github上也有崔老师的项目。

fangxuanhao commented 5 years ago

欧克欧克，我是我们公司有爬虫这方面的需求，想学学，我是做数据分析的建模算法的，有时候就会让我们公司的后台去做爬虫，就有点麻烦，想自己单弄哈哈

------------------ 原始邮件 ------------------ 发件人: "Chauncey2"notifications@github.com; 发送时间: 2019年8月2日(星期五) 下午2:00 收件人: "Chauncey2/zhaopin_spider"zhaopin_spider@noreply.github.com; 抄送: "麋鹿麋鹿不迷路"3241644639@qq.com; "Author"author@noreply.github.com; 主题: Re: [Chauncey2/zhaopin_spider] 我想换个城市的爬取怎么弄 (#1)

当不起当不起，兄台还是别用敬语了，我也是刚毕业，还在学习中，有问题我们可以交流，我在学习爬虫的时候看了一本书《Python3 网络爬虫开发实战》崔庆才主编的。我觉得挺不错，你如果感兴趣可以参考下，Github上也有崔老师的项目。

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

Chauncey2 commented 5 years ago

可以可以，兄台你是真大佬。

guanxijing commented 4 years ago

大佬，现在爬虫不能用了吗，提示这个错误呢，AttributeError: 'str' object has no attribute 'insert'，

Chauncey2 / zhaopin_spider