denshoproject / ddr-cmdln

Command-line tools for automating the Densho Digital Repository's various processes.
Other
0 stars 2 forks source link

ddrexport does not handle 'record_created' data correctly in entities if missing TZ name string #119

Closed GeoffFroh closed 5 years ago

GeoffFroh commented 5 years ago

When attempting to export entities with ddrexport where the 'record_created' attribute does not contain the timezone identifier string (e.g., PDT), the command fails with a datetime format conversion exception.

  File "/opt/ddr-local/ddr-defs/repo_models/entity.py", line 1447, in csvdump_record_created
    def csvdump_record_created(data): return converters.datetime_to_text(data)
  File "/opt/ddr-local/venv/ddrlocal/local/lib/python2.7/site-packages/ddr_cmdln-0.9.4b0-py2.7.egg/DDR/converters.py", line 129, in datetime_to_text
    raise Exception('Cannot strformat "%s": not a datetime.' % data)
Exception: Cannot strformat "<built-in method now of type object at 0x9637c0>": not a datetime.

This does not work:

"record_created": "2014-02-08T12:04:18-0700" (ddr-densho-201-472)

But this does work: "record_created": "2018-10-15T14:39:44PDT-0700" (ddr-densho-372-1)

Exception from this function:

https://github.com/densho/ddr-cmdln/blob/master/ddr/DDR/converters.py#L118

GeoffFroh commented 5 years ago

-.txt

gjost commented 5 years ago

The error Exception: Cannot strformat "<built-in method now of type object at 0x55da4fe4ade0>": not a datetime. indicates that the formatter is expecting a datetime object but instead was given a datetime.now method.

This likely comes from repo_models.entity.FIELDS:

    {
        'model':      'entity',
        'name':       'record_created',
        'model_type': datetime,
        'default':    datetime.now,
        'csv': {
            ...

This is instantiated properly using some metaprogramming -- assigning the datetime.now method to a variable (example: date_method) and then calling date_method(). This works for most objects.

The problem object is ddr-densho-201-1011, with a record_created raw value of 2018-10-17T08:05:42-0700. The easy answer is that this date is missing its timezone code, but the date immediately before it is almost identical, yet it works:

$ cat /var/www/media/ddr/ddr-densho-201/files/ddr-densho-201-1010/entity.json | grep record_created
        "record_created": "2018-10-17T08:02:07-0700"
$ cat /var/www/media/ddr/ddr-densho-201/files/ddr-densho-201-1011/entity.json | grep record_created
        "record_created": "2018-10-17T08:05:42-0700"

Something else is going on here.

gjost commented 5 years ago

Turns out the file for the item in question, /var/www/media/ddr/ddr-densho-201/files/ddr-densho-201-1011/entity.json, has a merge conflict that was never resolved:

    {
        "topics": [
            {
                "id": "89",
<<<<<<< HEAD
                "term": "World War II: Military service: 442nd Regimental Combat Team"
=======
                "term": "World War II -- Military service -- 442nd Regimental Combat Team"
>>>>>>> refs/remotes/origin/master
            }
        ]
    },

The real problem is that DDR.models.common.load_json tries to be "helpful" but not allowing errors for json_data = json.loads(text) to be properly thrown. It catches the error and substitutes a "helpful" data structure with useless informational data for json_data. The function then proceeds, eventually resulting in the datetime.now method to be assigned to record_created, and of course this can't be formatted by datetime.strformat.

Solution is to remove the error-catching and allow the error to surface properly:

2019-02-08 12:20:55,797 INFO     1008/1008 - ddr-densho-201-1011
Traceback (most recent call last):
  File "/opt/ddr-local-develop/venv/ddrlocal/bin/ddrexport", line 11, in <module>
    load_entry_point('ddr-cmdln==0.9.4b0', 'console_scripts', 'ddrexport')()
  ...
  File "/opt/ddr-local-develop/venv/ddrlocal/local/lib/python2.7/site-packages/ddr_cmdln-0.9.4b0-py2.7.egg/DDR/models/common.py", line 663, in load_json
    json_data = json.loads(json_text)
  ...
simplejson.errors.JSONDecodeError: Expecting property name enclosed in double quotes: line 85 column 1 (char 2317)
gjost commented 5 years ago

Fixed in commit 9dec8f5.