Open Aadityaa2606 opened 8 months ago
claim
The fix proposed by @rnavaneeth992 is a really good approach but not quite feasible for every irrelevant link, so I am reopening the issue back for other contributors to make additional improvements to the detection system on top of the existing approach!
Explanation of the Fix:
The previous fix added a try-catch block after querying the URL, the try-catch block Raises an HTTP error if the HTTP request returned an unsuccessful status code. which means if the link didn't give any HTTP error it doesn't detect the link is irrelevant
There was also a second check placed that if the summary content is 0, it predicts the URL as invalid.
Additional improvements that can be made:
Right now the existing approach finds and prevents a few links from getting summarised like www.google.com but still, there are sites like https://www.linkedin.com/feed/ https://github.com/ https://www.udemy.com/ and many more
We need a concrete method that separates news articles from normal websites to prevent irrelevant results and make the web application more feasible.
bug Description The issue is if we input any link (eg. www.google.com) the summariser thinks it's an article link and summarises it.
To Reproduce Steps to reproduce the behavior:
Expected behavior Prevent accepting irrelevant links, if the user tries to submit an irrelevant link then show them an error similar to this
Bug Screenshots
Possible approaches