Question in ch2 - Githubissues

REMitchell / python-scraping

Code samples from the book Web Scraping with Python http://shop.oreilly.com/product/0636920034391.do

4.42k stars 2.48k forks source link

That is because of casing. You have only captured 'the prince' but left out 'The prince' :) I got 11 by doing similar but with requests. You can just replace find_prince in your original code and it will work too

import re

import requests
from bs4 import BeautifulSoup

URL = "http://www.pythonscraping.com/pages/warandpeace.html"

# ignoring casing
find_prince = re.compile(r'the prince', re.IGNORECASE)

s = requests.Session()
r = s.get(URL)

soup = BeautifulSoup(r.content,'html5lib')

prince_found = soup.find_all(text = find_prince)

print(len(prince_found)) #11

REMitchell / python-scraping

Question in ch2 #76