Closed pranav7 closed 10 years ago
HTML documents don't have a published at or created at element. At least not reliably.
Agreed. Is there someway I could access the 'date_published' JSON tag that is exposed by the Parser API?
No, ruby-readability is a port of the older, open source JavaScript Readability library. Their newer features are not available, and I don't know how they determine the date_published metadata. Either they have a list of common tags to look for, or they're referring to the date that they fetched the data. You could try adding something like that to this library, or look into using their API.
On Sun, Apr 13, 2014 at 3:31 PM, Pranav Singh notifications@github.comwrote:
Agreed. Is there someway I could access the 'date_published' JSON tag that is exposed by the Parser API?
https://www.readability.com/developers/api/parser
Reply to this email directly or view it on GitHubhttps://github.com/cantino/ruby-readability/issues/69#issuecomment-40322181 .
Iteration Labs, LLC Andrew Cantino Founder / CEO
Aah! alright. I'd surely contribute if I figure something out. Thanks anyway. :smiley:
How can I extract the date on which the page being retrieved was 'created' and 'updated'? I tried using the method 'date_published' which is the JSON element that is exposed by the Readability Parser API, but that did not of course work.
I am not exactly sure if there is already a way to do it, but if there isn't, it would be great if we can have a method that does this. However, if there is, this is not exactly an Issue.