Tearever / ScrapingTheeWeb

0 stars 0 forks source link

Starting #1

Open Brandon-7-Sharp opened 7 months ago

Brandon-7-Sharp commented 7 months ago

We need to figure out how to differentiate between our two websites. Possibly read the address as a string and read the nth characters to determine if it is CNN or Space.

Hope you have a good day!

Brandon-7-Sharp commented 7 months ago

We can determine which article type it is in the run.py file and send in a variable to the 'process_article_from_file' and use that to determine which way to parse the html with beautifulsoup.

Brandon-7-Sharp commented 7 months ago

Chaged the run.py file to determine if the website is cnn or space and changed the 'process_article_from_file' function to take in an addition value which determines which html tags beautiful soup will use.