SuLab / WikidataIntegrator

A Wikidata Python module integrating the MediaWiki API and the Wikidata SPARQL endpoint
MIT License
244 stars 46 forks source link

wrong date format for certain entities #74

Closed trankmichael closed 6 years ago

trankmichael commented 6 years ago

from wikidataintegrator import wdi_core

wdi_core.WDItemEngine(wd_item_id='Q64')
wdi_core.WDItemEngine(wd_item_id='Q34') # fails here
/home/mtran/mtranenv/local/lib/python2.7/site-packages/wikidataintegrator/wdi_core.pyc in set_value(self, value)
   2023                         datetime.datetime.strptime(self.time, '+%Y-%m-%dT%H:%M:%SZ')
   2024             except ValueError as e:
-> 2025                 raise ValueError('Wrong data format, date format must be +%Y-%m-%dT%H:%M:%SZ or -%Y-%m-%dT%H:%M:%SZ')
   2026 
   2027     @classmethod

ValueError: Wrong data format, date format must be +%Y-%m-%dT%H:%M:%SZ or -%Y-%m-%dT%H:%M:%SZ

Most of the entities I tried work but there were a few cases that failed with this error. For example,

https://www.wikidata.org/wiki/Q34 / https://www.wikidata.org/wiki/Q4628 both fail with this error.

stuppie commented 6 years ago

Looks like the python datetime library doesn't handle negative dates (i.e. only dates from 1 AD). It is failing on: ValueError: time data '-12000-00-00T00:00:00Z' does not match format '-%Y-%m-%dT%H:%M:%SZ' https://www.wikidata.org/wiki/Q34#P2184 (12. millennium BCE) Nor does it handle years larger than 9999.

This is only being used to check the format. I suppose it can just be removed with no ill effects. What will happen is if the date is misformatted, instead of an error being thrown on instantiation the wikimedia api will throw an error on write.