ageitgey / node-unfluff

Automatically extract body content (and other cool stuff) from an html document
Apache License 2.0
2.15k stars 221 forks source link

Extracted Date is Wrong #93

Open AgoloAhmedElhady opened 6 years ago

AgoloAhmedElhady commented 6 years ago

I tried to extract this article Apple Seeds Eleventh Beta of iOS 12 to Developers [Update: Public Beta Available]

it was able to extract a date but it was the date of the first top rated comment not the article date response below.

{
  title: 'Apple Seeds Eleventh Beta of iOS 12 to Developers [Update',
  softTitle: 'Apple Seeds Eleventh Beta of iOS 12 to Developers [Update: Public Beta Available]',
  date: '8 hours ago at 10:09 am',
  author: ['Monday August 27, 2018 10:05 am PDT by Juli Clover'],
  publisher: null,
  copyright: '2000-document',
  favicon: '//cdn.macrumors.com/images-new/favicon.ico',
  description: 'Apple today seeded the eleventh beta of an upcoming iOS 12 update to developers for testing purposes, just a few days after seeding the tenth beta...',
  keywords: 'iOS 12',
  lang: 'en',
  canonicalLink: 'https://www.macrumors.com/2018/08/27/apple-seeds-ios-12-beta-11-to-developers/',
  tags: [],
  image: 'https://cdn.macrumors.com/article-new/2018/06/iOS-12-Memoji-800x775.jpg?retina',
  videos: [],
  links: [{
    text: 'Advertise on MacRumors',
    href: '//www.macrumors.com/contact.php'
  }],
  text: 'MacRumors attracts a broad audience         of both consumers and professionals interested in         the latest technologies and products. We also boast an active community focused on purchasing decisions and technical aspects of the iPhone, iPod, iPad, and Mac platforms.\n\nAdvertise on MacRumors'
}