issues
search
Alhajras
/
webscraper
Configurable search engine written in Python and Angular. It supports indexing as well.
1
stars
0
forks
source link
crawler-test benchmark
#14
Open
Alhajras
opened
1 year ago
Alhajras
commented
1 year ago
Tests coverage:
[ ] Test if it has
found
all links on the site: links in the site = 415.
[ ] Test if it has
visited
all links on the site: links in the site = 415.
[ ] The crawling should not fail
[ ] Contents
[ ] Content is Empty ()
[ ] Title Missing (Element does not exist)
[ ] Title Duplicates (For me, I do not care about the content only as the URL is also should be unique one can use finger print)
[ ] Title Too Long (DB should handle long text)
[ ] Encoding
[ ] URL with Foreign Characters - Hebrew / Polish / German / Spanish
[ ] Arabic and Japanese are not working.
[ ] Robots Protocol (The next URLS should be excluded)
[ ]
https://crawler-test.com/robots_protocol/robots_excluded
[ ]
https://crawler-test.com/robots_protocol/deepcrawl_excluded
[ ] Redirects (Seems like not working!)
[ ] I should check if the host after the redirect is equals to the origin one after using driver.get(), this way I prevent going out of the original website.
[ ]
Redirect 301
[ ]
Redirect 302
[ ]
Infinite Redirect
[ ]
javascript:window.location external
[ ] Links
[ ]
Broken Links Internal
[ ]
Broken Links External
[ ]
Page with External Links
[ ]
Relative Link
Tests coverage: