[x] I searched other issues (including closed issues) and could not find any to be related. If you find related issues post them below or directly add your issue to the most related one.
Describe your question
I am currently developing an application that focuses on aggregating information about healthcare business strategy. In the process, I've encountered security issues while parsing articles using news-please, specifically receiving 403 errors. After investigating, I found that adding a user agent to the requests may help in bypassing these errors. However, I could not find a direct way to set a custom user agent in news-please.
Versions (please complete the following information):
OS: macOS 11.1 (Apple M1)
Python Version: 3.12
news-please Version: 1.5.35
Intent (optional; we'll use this info to prioritize upcoming tasks to work on)
[ ] personal
[ ] academic
[x] business
[ ] other
Some information on your project: My project involves aggregating healthcare business strategy information using news-please.
In newspaper3k, the user agent can be set as follows:
from newspaper import Article
from newspaper import Config
USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Firefox/78.0'
config = Config()
config.browser_user_agent = USER_AGENT
config.request_timeout = 10
RETRY_ATTEMPTS = 3
def parse_article(url):
for attempt in range(RETRY_ATTEMPTS):
try:
article = Article(url)
return article
except requests.RequestException as e:
print(f"Error retrieving article from URL '{url}': {str(e)} ({attempt + 1}/{RETRY_ATTEMPTS})")
return None
I suggest implementing a similar feature in news-please to allow users to set a custom user agent, which can be beneficial for cases where websites block requests without a user agent, resulting in 403 errors.
Additionally, if there is already a way to set a custom user agent in news-please that I am not aware of, could you please add this information to the readme to avoid confusion among users?
Mandatory
Describe your question I am currently developing an application that focuses on aggregating information about healthcare business strategy. In the process, I've encountered security issues while parsing articles using news-please, specifically receiving 403 errors. After investigating, I found that adding a user agent to the requests may help in bypassing these errors. However, I could not find a direct way to set a custom user agent in news-please.
Versions (please complete the following information):
Intent (optional; we'll use this info to prioritize upcoming tasks to work on)
In newspaper3k, the user agent can be set as follows:
I suggest implementing a similar feature in news-please to allow users to set a custom user agent, which can be beneficial for cases where websites block requests without a user agent, resulting in 403 errors.
Additionally, if there is already a way to set a custom user agent in news-please that I am not aware of, could you please add this information to the readme to avoid confusion among users?
Thank you for considering this enhancement.