Scrape text from all matching tags

hanzalajamash commented 1 year ago

Currently the code just scrapes the first tag it finds which is given by the user.

Ideally, it should scrape all the tags matching with the tag that the user provided.

Any thoughts?

CyberPunkMetalHead commented 1 year ago

I'm just trying to think if there are any cases where you might not want this. Have you tested it on some websites, if so, what are your results like?

hanzalajamash commented 1 year ago

with article tag set to 'p' in the config. I was not able to scrape more than 1 paragraph from the following websites.

https://www.motointegrator.de/blog/fragen-zum-thema-e-autos-und-bremsenservice/ scraped_output: 'Inhaltsverzeichnis' (Just a single word) output after changes: 1st url.txt

https://www.motointegrator.de/blog/der-beruf-des-kfz-mechatroniker/ scraped_output: 'Heute beantwortet uns Andreas die wichtigsten Fragen zum Beruf des KFZ-Mechatronikers. Erfahren Sie interessante Berufsperspektiven, schmunzeln Sie über Kundenwünsche und lesen Sie wissenswertes über die Situation auf dem Arbeitsmarkt.' output after changes: 2nd url.txt

https://www.healthline.com/migraine scraped_output: 'Tips, tools, and support for living and thriving with migraine. ' output after changes: 3rd url.txt

After making the change in the code I was able to scrape more content.

CyberPunkMetalHead / seo-gpt

Scrape text from all matching tags #4