Closed hanzalajamash closed 1 year ago
I'm just trying to think if there are any cases where you might not want this. Have you tested it on some websites, if so, what are your results like?
with article tag set to 'p' in the config. I was not able to scrape more than 1 paragraph from the following websites.
https://www.motointegrator.de/blog/fragen-zum-thema-e-autos-und-bremsenservice/ scraped_output: 'Inhaltsverzeichnis' (Just a single word) output after changes: 1st url.txt
https://www.motointegrator.de/blog/der-beruf-des-kfz-mechatroniker/ scraped_output: 'Heute beantwortet uns Andreas die wichtigsten Fragen zum Beruf des KFZ-Mechatronikers. Erfahren Sie interessante Berufsperspektiven, schmunzeln Sie über Kundenwünsche und lesen Sie wissenswertes über die Situation auf dem Arbeitsmarkt.' output after changes: 2nd url.txt
https://www.healthline.com/migraine scraped_output: 'Tips, tools, and support for living and thriving with migraine. ' output after changes: 3rd url.txt
After making the change in the code I was able to scrape more content.
Currently the code just scrapes the first tag it finds which is given by the user.
Ideally, it should scrape all the tags matching with the tag that the user provided.
Any thoughts?