SkyTruth / django_tools

1 stars 1 forks source link

null fracture_date in parsed PDF #2

Closed craigwin-ni closed 10 years ago

craigwin-ni commented 10 years ago

Testing of the headless scraper resulted in 48 errors for null fracture_date. These may result from rigid 'american style' date parsing. The scrapy implementation uses dateutil parser with the fuzzy flag set (from nrc/items.py):

    from dateutil.parser import parse as parse_date
    ...
    def convert_fuzzy_date(dt):
        new_dt=parse_date(dt,fuzzy=1)
        return format_datetime(new_dt)

dateutil is available from pypi.

redhog commented 10 years ago

Fixed