Closed rahulbot closed 3 years ago
Hi @rahulbot, since I was mostly interested in a granularity on day level I didn't implement time zone identification so far. However, the underlying libraries python-dateutil
, dateparser
, and the optional one ciso8601
all deal with it IMHO.
Good to know, thanks. In the longer term, if we do switch to htmldate for use in Media Cloud we might explore integrating time parsing (at least for the machine readable timestamps in metadata). In that case we'd probably add in timezone parsing.
You can use %Y-%m-%dT%H:%M:%S%z
as the outputformat
argument
Output :
2021-10-18T15:30:00+0330
And with that output and something like python-dateutil package (parse
method) , you can reach this pattern :
2021-10-18 15:30:00+03:30
Some articles include the full publication time, with timezone, in HTML meta tags or Javascript config. Does this library parse and handle those timezones? Relatedly, how does it internally store dates with regards to timezone - are the all returned in machine-local time, held in GMT, or something else?
For instance, this Guardian article includes the
article:published_time
meta tag with a timezone included. Does this library recognize that timezone and return the date as it would be in GMT? Same for this article on CNN, which includes thedatePublished
meta tag.