adilsoncarvalho / barateza-nfcrawler

Crawler to get data from the NF-e and NFC-e
2 stars 1 forks source link

Convert strings into date time values #21

Closed adilsoncarvalho closed 8 years ago

adilsoncarvalho commented 8 years ago

Convert bad shaped strings into date time values.

Some examples

{
  "data_emissao":"10/11/2016 18:19:21-02:00",
  "data_autorizacao":"10/11/2016 \n        \xe0s\n      18:19:22-02:00",
}

How to find on the code

# TODO: create a DATE TIME ZONE processor

Postponed on PR #5

adilsoncarvalho commented 8 years ago

Removing extra chars using regex

Well-formed date and time

import re

p = re.compile('[^0-9:/-]')
p.sub(' ', u'10/11/2016 18:19:21-02:00')
# => u'10/11/2016 18:19:22-02:00'

pp = re.compile('\s+')
pp.sub(' ', u'10/11/2016 18:19:22-02:00')
# => u'10/11/2016 18:19:22-02:00'

Malformed date time

import re

p = re.compile('[^0-9:/-]')
p.sub(' ', u'10/11/2016 \n        \xe0s\n      18:19:22-02:00')
# => u'10/11/2016                   18:19:22-02:00'

pp = re.compile('\s+')
pp.sub(' ', u'10/11/2016                   18:19:22-02:00')
# => u'10/11/2016 18:19:22-02:00'
adilsoncarvalho commented 8 years ago

Converting to the internet standard date time format

You can see the whole documentation here.

YYYY-MM-DDThh:mm:ss.sTZD (eg 1997-07-16T19:20:30.45+01:00)

import re
r = re.compile('(\d{2})/(\d{2})/(\d{4})\s+(\d{2}:\d{2}:\d{2}[+-]\d{2}:\d{2})')
dt = r.search(u'10/11/2016 18:19:22-02:00').groups()
date = dt[2] + '-' + dt[1] + '-' + dt[0] + 'T'+dt[3]
#=> u'2016-11-10T18:19:22-02:00'