codelucas / newspaper

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:
https://goo.gl/VX41yK
MIT License
14.05k stars 2.11k forks source link

I'm getting (' and ') added to top_image and title. #699

Open philmade opened 5 years ago

philmade commented 5 years ago

So I'm trying to get a direct link to top_image, and to title etc, so I can save them to my database. Depending on how I implement a save to the database, I'm getting (' added at the start of a title, with a ') added at the end. Same for top_image. However, sometimes, this isn't an issue. I can't pin down what's causing these characters to be added?

Examples: top_image = ('https://pv-magazine-usa.com/wp-content/uploads/sites/2/2018/12/green-new-deal-2-e1544713444650-1200x711.jpg',)

hatarist commented 5 years ago

Uh.. Python causes these characters to be added? It looks like a tuple with a single element.

article = Article('https://pv-magazine-usa.com/2019/02/07/the-green-new-deal-is-going-to-happen-at-the-state-not-federal-level/')
article.download()
article.parse()
print(repr(article.top_image))

Does this code display ('https://pv-magazine-usa.com/wp-content/uploads/sites/2/2018/12/green-new-deal-2-e1544713444650-1200x711.jpg',)?

If it doesn't (it shouldn't), look for tuple() calls or, maybe, excessive commas in your code. For example, it could be something like that:

top_tuple = article.top_image,    # note the extra comma at the end
print(repr(top_tuple))
# shows a tuple