Closed prajwalkkr closed 5 years ago
You can change the first parameter to find the total page in the image above.
抱歉,这种方法不能解决问题,原因是URL发生了变化。
Sorry, is there a more detailed explanation?
Crawling the descriptions is an engineering problem. I will not keep the crawling script updated with Shutterstock.
I've been trying to run crawl_description.py but I am getting an error.
Traceback (most recent call last): File "crawl_descriptions1.py", line 92, in app.run(main) File "C:\Users\asus\Anaconda3\envs\tensorflow_gpu\lib\site-packages\absl\app.py", line 300, in run _run_main(main, args) File "C:\Users\asus\Anaconda3\envs\tensorflow_gpu\lib\site-packages\absl\app.py", line 251, in _run_main sys.exit(main(argv)) File "crawl_descriptions1.py", line 88, in main download(FLAGS.data_dir, FLAGS.num_pages, i, c) File "crawl_descriptions1.py", line 68, in download all_pages = get_num_pages(label) File "crawl_descriptions1.py", line 60, in get_num_pages num_pages = int(obj.group(1)) AttributeError: 'NoneType' object has no attribute 'group'
obj = re.search('data-max="(\d*)"', page)
This is mostly because obj is None as it is not able to find a match "('data-max="(\d*)" in the source code. The source code for Shutterstock might have changed.
Can anyone help. Or update the python files
I just wonder if you have solved this problem? I got a same error.
change the code in the file crawl_description.py
' obj = re.search('data-max="(\d)"', page) into
obj = re.search('max="(\d)"', page)`
,you may have a try. For me, it works.
I've been trying to run crawl_description.py but I am getting an error.
Traceback (most recent call last): File "crawl_descriptions1.py", line 92, in
app.run(main)
File "C:\Users\asus\Anaconda3\envs\tensorflow_gpu\lib\site-packages\absl\app.py", line 300, in run
_run_main(main, args)
File "C:\Users\asus\Anaconda3\envs\tensorflow_gpu\lib\site-packages\absl\app.py", line 251, in _run_main
sys.exit(main(argv))
File "crawl_descriptions1.py", line 88, in main
download(FLAGS.data_dir, FLAGS.num_pages, i, c)
File "crawl_descriptions1.py", line 68, in download
all_pages = get_num_pages(label)
File "crawl_descriptions1.py", line 60, in get_num_pages
num_pages = int(obj.group(1))
AttributeError: 'NoneType' object has no attribute 'group'
obj = re.search('data-max="(\d*)"', page)
This is mostly because obj is None as it is not able to find a match "('data-max="(\d*)" in the source code. The source code for Shutterstock might have changed.
Can anyone help. Or update the python files