Niz97 / Software-Engineering

0 stars 0 forks source link

Downloading News Content - Not Always Successful #1

Closed ArvinZJC closed 5 years ago

ArvinZJC commented 5 years ago

Keyword extractor (V2.0.1.20191105): In order to analyse the news content to extract keywords, it needs downloading first. However, exception may be raised for some URLs because of the code 403 and 404. Handling is needed for these kinds of URLs.

ArvinZJC commented 5 years ago

Keyword extractor (V2.1.0.20191105): Code for avoiding Newspaper3k 403 Client Error for some URLs has been added and has passed an initial test.

ArvinZJC commented 5 years ago

Keyword extractor (V2.5.0.20191108): After adding a try...except...else statement and "stripping" each URL, newspaper3k 404 client error and some other connection errors have now been avoided.