AndyTheFactory / newspaper4k

📰 Newspaper4k a fork of the beloved Newspaper3k. Extraction of articles, titles, and metadata from news websites.
MIT License
467 stars 46 forks source link

I'm getting (' and ') added to top_image and title. #369

Open AndyTheFactory opened 1 year ago

AndyTheFactory commented 1 year ago

Issue by philmade Thu May 9 12:14:55 2019 Originally opened as https://github.com/codelucas/newspaper/issues/699


So I'm trying to get a direct link to top_image, and to title etc, so I can save them to my database. Depending on how I implement a save to the database, I'm getting (' added at the start of a title, with a ') added at the end. Same for top_image. However, sometimes, this isn't an issue. I can't pin down what's causing these characters to be added?

Examples: top_image = ('https://pv-magazine-usa.com/wp-content/uploads/sites/2/2018/12/green-new-deal-2-e1544713444650-1200x711.jpg',)

AndyTheFactory commented 1 year ago

Comment by hatarist Tue May 14 06:31:24 2019


Uh.. Python causes these characters to be added? It looks like a tuple with a single element.

article = Article('https://pv-magazine-usa.com/2019/02/07/the-green-new-deal-is-going-to-happen-at-the-state-not-federal-level/')
article.download()
article.parse()
print(repr(article.top_image))

Does this code display ('https://pv-magazine-usa.com/wp-content/uploads/sites/2/2018/12/green-new-deal-2-e1544713444650-1200x711.jpg',)?

If it doesn't (it shouldn't), look for tuple() calls or, maybe, excessive commas in your code. For example, it could be something like that:

top_tuple = article.top_image,    # note the extra comma at the end
print(repr(top_tuple))
# shows a tuple