@JulesGM I have implemented all the architectural and stylistic suggestions you requested. This new pull request adds Bing Search since that was what was used in the ParlAI Blenderbot2 paper. It also allows you to limit the the text per URL since currently Blenderbot only uses the first 512 characters. It allows you to strip out HTML menus. You can also return a clean summary of each web page at 10X faster since it does not need to fetch each URL. I have updated the README with examples to allow you to quickly test these options. Overall it enables the search engine to return significantly higher quality text to Blenderbot2. I will send you a separate private email with the URLs to each of these test URLs, which I have deployed as Docker Containers to Google Cloud in case you do not have a Bing Search Subscription key and want to test them. Thank you again for your time.
@JulesGM I have implemented all the architectural and stylistic suggestions you requested. This new pull request adds Bing Search since that was what was used in the ParlAI Blenderbot2 paper. It also allows you to limit the the text per URL since currently Blenderbot only uses the first 512 characters. It allows you to strip out HTML menus. You can also return a clean summary of each web page at 10X faster since it does not need to fetch each URL. I have updated the README with examples to allow you to quickly test these options. Overall it enables the search engine to return significantly higher quality text to Blenderbot2. I will send you a separate private email with the URLs to each of these test URLs, which I have deployed as Docker Containers to Google Cloud in case you do not have a Bing Search Subscription key and want to test them. Thank you again for your time.