crawl_description.py - Githubissues

prajwalkkr commented 5 years ago

I've been trying to run crawl_description.py but I am getting an error.

Traceback (most recent call last): File "crawl_descriptions1.py", line 92, in app.run(main) File "C:\Users\asus\Anaconda3\envs\tensorflow_gpu\lib\site-packages\absl\app.py", line 300, in run _run_main(main, args) File "C:\Users\asus\Anaconda3\envs\tensorflow_gpu\lib\site-packages\absl\app.py", line 251, in _run_main sys.exit(main(argv)) File "crawl_descriptions1.py", line 88, in main download(FLAGS.data_dir, FLAGS.num_pages, i, c) File "crawl_descriptions1.py", line 68, in download all_pages = get_num_pages(label) File "crawl_descriptions1.py", line 60, in get_num_pages num_pages = int(obj.group(1)) AttributeError: 'NoneType' object has no attribute 'group'

obj = re.search('data-max="(\d*)"', page)

This is mostly because obj is None as it is not able to find a match "('data-max="(\d*)" in the source code. The source code for Shutterstock might have changed.

Can anyone help. Or update the python files

fengyang0317 commented 5 years ago

You can change the first parameter to find the total page in the image above.

IrectionD commented 5 years ago

抱歉，这种方法不能解决问题，原因是URL发生了变化。

tom285 commented 5 years ago

Sorry, is there a more detailed explanation?

fengyang0317 commented 5 years ago

Crawling the descriptions is an engineering problem. I will not keep the crawling script updated with Shutterstock.

fresh382227905 commented 4 years ago

I've been trying to run crawl_description.py but I am getting an error.

Traceback (most recent call last): File "crawl_descriptions1.py", line 92, in app.run(main) File "C:\Users\asus\Anaconda3\envs\tensorflow_gpu\lib\site-packages\absl\app.py", line 300, in run _run_main(main, args) File "C:\Users\asus\Anaconda3\envs\tensorflow_gpu\lib\site-packages\absl\app.py", line 251, in _run_main sys.exit(main(argv)) File "crawl_descriptions1.py", line 88, in main download(FLAGS.data_dir, FLAGS.num_pages, i, c) File "crawl_descriptions1.py", line 68, in download all_pages = get_num_pages(label) File "crawl_descriptions1.py", line 60, in get_num_pages num_pages = int(obj.group(1)) AttributeError: 'NoneType' object has no attribute 'group'

obj = re.search('data-max="(\d*)"', page)

This is mostly because obj is None as it is not able to find a match "('data-max="(\d*)" in the source code. The source code for Shutterstock might have changed.

Can anyone help. Or update the python files

I just wonder if you have solved this problem? I got a same error.

fresh382227905 commented 4 years ago

change the code in the file crawl_description.py ' obj = re.search('data-max="(\d)"', page) into obj = re.search('max="(\d)"', page)` ,you may have a try. For me, it works.

fengyang0317 / unsupervised_captioning

crawl_description.py #8