Open shufanzhang opened 5 years ago
That is because of casing. You have only captured 'the prince' but left out 'The prince' :) I got 11 by doing similar but with requests. You can just replace find_prince in your original code and it will work too
import re
import requests
from bs4 import BeautifulSoup
URL = "http://www.pythonscraping.com/pages/warandpeace.html"
# ignoring casing
find_prince = re.compile(r'the prince', re.IGNORECASE)
s = requests.Session()
r = s.get(URL)
soup = BeautifulSoup(r.content,'html5lib')
prince_found = soup.find_all(text = find_prince)
print(len(prince_found)) #11
from urllib.request import urlopen from bs4 import BeautifulSoup html=urlopen("http://www.pythonscraping.com/pages/warandpeace.html") bs=BeautifulSoup(html,"html.parser") nameList = bs.find_all(text='the prince') print(len(nameList))
I run the code above and the result is 7. However, when I use 'ctrl+F' to search 'the prince' in the the browser, the result is 11. I'm confused why the results are inconsistent.