Open vibhcool opened 7 years ago
@brainstormm is working on this issue.
Analysis till now :
Suppose i queried for London https://www.timeanddate.com/worldclock/results.html?query=London Analysis : 1 : "Day" and "Month-name" are encoded in Hindi . for example -> "Wed" converted to "बुधवार"; "August converted to "अगस्त"; 2 : Time format is somehow converted as follows : for example -> "12:21" converted to "12.21" (notice colon replaced by dot). 3 : Analyzing source code directly (from the website "view source as") shows no language change.
Doubts/Queries : 1: Is it always converting to Hindi ? Or say if a user is in Russia or come other country, will it show them to hindi or in some other language ? 2 : Why is this getting converted to Hindi if it is just a simple scrapping , is there some bug in request library ?
Options tried : 1 : Tried different UTF-8 encodings : None worked. 2 : Google translate API , but it is not free : Dropped the idea. 3 : Thought of translating the hindi text to english by using google translate through a url query (https://translate.google.com/#auto/en/बुधवार) and then scrapping the result , but guess what google is smarter than us, the keyword "Wednesday" can not be found in the scrapped result. : Dropped the idea.
There is one issue in TimeAndDate scraper that it doesn't process the date fetched to standard format in which it can be directly used . Something like: Thu Apr 06 15:14:32 IST 2017 or 2017-04-06T09:44:32.000Z
this scraper requires processing of the fetched data.