Update the documentation based on the comments from susanna

Weyaaron commented 1 year ago

[x] sometimes the next says we’ll crawl 5 articles but the code only crawls two
[x] In Tutorial 2 about the Article class, maybe state more clearly your definitions add of title, summary, section, subheadlineand so on. Makes it easier to follow and to try out the different parts. Also: What is DOM?
[x] Tutorial 3 for filtering the articles, e. g. by author: As much as I like Donald Duck, maybe one could chose a different, more realistic, example so users actually get good results instead of waiting, maybe wondering if the system is still running or what 😅 Maybe filtering for a specific publishing time range, language or topic?
[x] More info or examples on how to find the specific URLs to Sitemap, NewsMap, RSSFeed, as well as what their differences and purposes are, and how/why the need to differentiate them. At least for me, this was new and I found myself a bit puzzled at first.
[x] Most importantly I would have liked more help for the basics of how to use the CSSSelect and XPath selectors, which is obviously the most work when adding a parser. I think that here, a small list of examples with html and some fitting example selectors, together with explanations what they do would be very helpful!

Edit: I cut the original text down to bullet points.

Weyaaron commented 1 year ago

I think points 1 and 3 are not controversial at all, I will start working on them. The other may require some more discussion.

Weyaaron commented 1 year ago

All of these points have been addressed. And the contribution these are based upon is done(#305 ), therefore this can be closed.

flairNLP / fundus