Closed anniehuang921 closed 9 years ago
未修改 README.md ... XD
README.md 保持原樣XD finished task, please review!
@c3h3 @adrianliaw @ChihChengLiang
我執行的結果,好像有一些問題(我原來資料庫有一些東西)
我是看README執行YT_CHANNEL_ID="TWuseRGroup" MONGO_URI="mongodb://localhost:27017/agilearning" python youtube_crawler.py
/home/chihchengliang/.pyenv/versions/2.7.8/lib/python2.7/site-packages/setuptools-5.6-py2.7.egg/pkg_resources.py:1049: UserWarning: /home/chihchengliang/.python-eggs is writable by group/others and vulnerable to attack when used with get_resource_filename. Consider a more secure location (set with .set_extraction_path or the PYTHON_EGG_CACHE environment variable).
next_page_list = ['https://gdata.youtube.com/feeds/users/TWuseRGroup/uploads/']
next_page_list = [u'https://gdata.youtube.com/feeds/users/TWuseRGroup/uploads?alt=json&start-index=26&max-results=25']
next_page_list = [u'https://gdata.youtube.com/feeds/users/TWuseRGroup/uploads?alt=json&start-index=51&max-results=25']
next_page_list = [u'https://gdata.youtube.com/feeds/users/TWuseRGroup/uploads?alt=json&start-index=76&max-results=25']
next_page_list = [u'https://gdata.youtube.com/feeds/users/TWuseRGroup/uploads?alt=json&start-index=101&max-results=25']
next_page_list = [u'https://gdata.youtube.com/feeds/users/TWuseRGroup/uploads?alt=json&start-index=126&max-results=25']
next_page_list = [u'https://gdata.youtube.com/feeds/users/TWuseRGroup/uploads?alt=json&start-index=151&max-results=25']
next_page_list = [u'https://gdata.youtube.com/feeds/users/TWuseRGroup/uploads?alt=json&start-index=176&max-results=25']
next_page_list = [u'https://gdata.youtube.com/feeds/users/TWuseRGroup/uploads?alt=json&start-index=201&max-results=25']
next_page_list = [u'https://gdata.youtube.com/feeds/users/TWuseRGroup/uploads?alt=json&start-index=226&max-results=25']
next_page_list = [u'https://gdata.youtube.com/feeds/users/TWuseRGroup/uploads?alt=json&start-index=251&max-results=25']
next_page_list = [u'https://gdata.youtube.com/feeds/users/TWuseRGroup/uploads?alt=json&start-index=276&max-results=25']
next_page_list = []
Traceback (most recent call last):
File "youtube_crawler.py", line 29, in <module>
if ddt !=[]:learning_resources_collection.insert(ddt)
File "build/bdist.linux-x86_64/egg/pymongo/collection.py", line 410, in insert
File "build/bdist.linux-x86_64/egg/pymongo/helpers.py", line 202, in _check_write_command_response
pymongo.errors.DuplicateKeyError: insertDocument :: caused by :: 11000 E11000 duplicate key error index: agilearning.learningResources.$_id_ dup key: { : "YTV_NqXyh1rOy-s" }
@ChihChengLiang 我試著把資料庫裝滿再移除一些資料,沒有遇到相同狀況。。。
youtube_crawler.ipynb 可以做測試用
Finished task again, please review. @c3h3 @adrianliaw @ChihChengLiang
finished task, please review!
@c3h3 @adrianliaw @ChihChengLiang