Open chentian418 opened 2 years ago
You are correct. pTag.findAll('a', href=re.compile('/wiki/'), class_=False)
returns a list of strings that (1) starts with <a
and (2) contains /wiki/
substring (in the href attribute).
If you use pTag.findAll('a', href=re.compile('https://www.nytimes.com'), class_=False)
, this should return a list of strings that (1) starts with <a
and (2) contains https://www.nytimes.com
substring (e.g., href="https://www.nytimes.com/2021/07/23/technology/silicon-valleys-pandemic-profits.html"
).
If your code does not return any result, make sure that pTag
contains the strings that you are looking for.
Would you try this code:
import bs4
import requests
url = "address of webpage that includes <a class..."
req = requests.get(url)
soup = bs4.BeautifulSoup(req.text, 'html.parser')
print(soup.findAll('a', href=re.compile('https://www.nytimes.com'), class_=False))
Hi, in the second example, I find there is one line of code:
tagLinks = pTag.findAll('a', href=re.compile('/wiki/'), class_=False)
And I want to make sure if this line is used to find the string that starts with "a" and href='/wiki/'; for example:
<a href="/wiki/Mass_communication" title="Mass communication">mass communication</a>
However, when I use
pTag.findAll('a', href=re.compile('https://www.nytimes.com'), class_=False)
with no base url to extract<a class="css-1g7m0tk" href="https://www.nytimes.com/2021/07/23/technology/silicon-valleys-pandemic-profits.html" title="">
, it doesn't return anything.Would you mind explaining a bit about the meaning of the codes and my problem. Thank you!