Closed TeonaEcon closed 4 years ago
I did try that as well, but could not find the links I wanted .
The link of the webpage: "https://reportal.ge/Forms.aspx?payerCode=204935400&SystemID=6160&show=1&np=1&cid=IV&prd=show"
From the table I need to get the links to the company pages. The code which scrapes links (hrefs):
`#pip install regex from bs4 import BeautifulSoup from urllib.request import Request, urlopen import re
req = Request("https://reportal.ge/Forms.aspx?payerCode=204935400&SystemID=6160&show=1&np=1&cid=IV&prd=show") html_page = urlopen(req)
soup = BeautifulSoup(html_page, "lxml")
links = [] for link in soup.findAll('a'): links.append(link.get('href'))
print(links)`
does not print the right links (href) of company pages, but works for other websites
Code behind the website
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
https://stackoverflow.com/a/16323809/3033613
Also this, scroll down to find all()
http://www.compjour.org/warmups/govt-text-releases/intro-to-bs4-lxml-parsing-wh-press-briefings/
I used ScrapStorm AI based program :) I found it faster.
Description
I am trying to get all the linked pages given on the table on this aspx page: https://reportal.ge/BannersMenu/Detailed-search-for-reports.aspx?lang=en-US
What I Did
Please, would you have any suggestion how to get the linked pages (they are individual pages for each company)?