codelucas / newspaper

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:
https://goo.gl/VX41yK
MIT License
14.04k stars 2.11k forks source link

Problems in obtaining the date of publication of a news with NEWSPAPER #611

Open alejandrohdo opened 6 years ago

alejandrohdo commented 6 years ago

Good morning, I do not know if this medium is adequate to send you a problem that I found with NEWSPAPER. I was testing the download of Articles of sites, and for some reason of some important sites I do not capture the date of puglication of the news, for example these means I do not extract the field DATE OF PUBLICATION:

https://rpp.pe/politica/actualidad/ipsos-reggiardo-se-mantiene-al-frente-de-las-preferencias-y-belmont-ocupa-el-segundo-lugar-noticia-1144430

https://rpp.pe/musica/conciertos/the-rolling-stones-noticia-942687

https://rpp.pe/economia/economia/lunes-negro-para-wall-street-bolsas-europeas-asiaticas-y-sudamericanas-noticia-1103437

https://rpp.pe/fiestas-patrias/fiestas-patrias-diez-eventos-culturales-para-celebrar-al-peru-noticia-1139620

https://rpp.pe/lima/actualidad/video-un-apagon-afecto-al-aeropuerto-jorge-chavez-noticia-1100571 https://rpp.pe/gastronomia/actualidad/el-restaurante-peruano-central-fue-elegido-el-cuarto-mejor-del-mundo-noticia-970948 https://www.foxsports.com.mx/news/372616-botafogo-derroto-a-nacional-de-paraguay-y-sigue-en-la-sudamericana If you can help me identify what the problem is in this part ..!, I will be grateful for your contributions

I tried it each one like this:

>>> url = "https://rpp.pe/fiestas-patrias/fiestas-patrias-diez-eventos-culturales-para-celebrar-al-peru-noticia-1139620"
>>> a = Article(url)
>>> a.download()
>>> a.parse()
>>> a.publish_date
>>> 
ivanovishado commented 6 years ago

I think this problem has already been addressed in #151 Please check it, I could be wrong.