IUCN-ELC / ecolex

ECOLEX website
https://www.ecolex.org
2 stars 2 forks source link

Management command to get and unzip xml and import parsed data #113

Closed raressiteavu closed 2 years ago

melish commented 2 years ago

Tested on https://ecolex.edw.ro using ./manage.py import_xml --input_url https://ecolex.edw.ro/static/faolex_202107.zip

It ran for about 1h and ended abruptly with an unhandled exception:

  ...
  File "/home/web/ecolex/ecolex/legislation.py", line 278, in add_legislations
    doc.save()
  ...
  django.db.utils.OperationalError: (2006, 'MySQL server has gone away')

(see full stack trace below)

The overall number of legislation records hasn't changed (https://ecolex.edw.ro/result/?type=legislation), so I can't tell if anything has been updated or not.

legislation_import.log file wasn't really useful, I can't tell which records were processed. Strangely, there are no "INFO" lines in legislation_import.log (only DEBUG messages), but there are a lot of messages in django_errors.log (looks like django.db.backends outputs everything

-rw-r--r-- 1 root root 546K Dec  5 16:12 legislation_import.log
-rw-r--r-- 1 root root 2.8G Dec  5 17:04 django_errors.log

Last rows in django_errors were ``` [05/Dec/2021 17:04:30] DEBUG [django.db.backends:89] (0.001) None; args=('LEX-FAOC039005', 'legislation', 'http://extwprlegs1.fao.org/docs/pdf/eur39005.pdf', '<?xml version="1.0" encoding="UTF-8"?>\n< [05/Dec/2021 17:04:30] DEBUG [django.db.backends:89] (0.006) None; args=('LEX-FAOC042993', '2021-12-05 15:12:35.169401') [05/Dec/2021 17:04:30] DEBUG [django.db.backends:89] (0.001) None; args=('LEX-FAOC042993', 'legislation', 'http://extwprlegs1.fao.org/docs/texts/par42993.doc', '<?xml version="1.0" encoding="UTF-8"?>\ [05/Dec/2021 17:04:31] DEBUG [django.db.backends:89] (0.845) None; args=('LEX-FAOC050711', '2021-12-05 15:12:35.169401') [05/Dec/2021 17:04:31] DEBUG [django.db.backends:89] (0.648) None; args=('LEX-FAOC050711', 'legislation', 'http://extwprlegs1.fao.org/docs/pdf/eur50711.pdf', '<?xml version="1.0" encoding="UTF-8"?>\n< [05/Dec/2021 17:04:32] DEBUG [django.db.backends:89] (0.011) None; args=None


LEX-FAOC050711 is indeed a very large PDF file (3146 pages).
But even when there are such errors, they must be caught and the import command need to continue.

See also a few other comments in the PR.
And please improve the logging (add more context to those messages, so they are meaningful even after the import has finished).

Full stack trace:

Traceback (most recent call last): File "./manage.py", line 10, in execute_from_command_line(sys.argv) File "/usr/local/lib/python3.6/site-packages/django/core/management/init.py", line 353, in execute_from_command_line utility.execute() File "/usr/local/lib/python3.6/site-packages/django/core/management/init.py", line 345, in execute self.fetch_command(subcommand).run_from_argv(self.argv) File "/usr/local/lib/python3.6/site-packages/django/core/management/base.py", line 348, in run_from_argv self.execute(*args, *cmd_options) File "/usr/local/lib/python3.6/site-packages/django/core/management/base.py", line 399, in execute output = self.handle(args, **options) File "/home/web/ecolex/ecolex/management/commands/import_xml.py", line 24, in handle response = harvest_file(legislation_file.read()) File "/home/web/ecolex/ecolex/legislation.py", line 228, in harvest_file add_legislations(legislations, count_ignored) File "/home/web/ecolex/ecolex/legislation.py", line 278, in add_legislations doc.save() File "/usr/local/lib/python3.6/site-packages/django/db/models/base.py", line 708, in save force_update=force_update, update_fields=update_fields) File "/usr/local/lib/python3.6/site-packages/django/db/models/base.py", line 736, in save_base updated = self._save_table(raw, cls, force_insert, force_update, using, update_fields) File "/usr/local/lib/python3.6/site-packages/django/db/models/base.py", line 801, in _save_table forced_update) File "/usr/local/lib/python3.6/site-packages/django/db/models/base.py", line 851, in _do_update return filtered._update(values) > 0 File "/usr/local/lib/python3.6/site-packages/django/db/models/query.py", line 645, in _update return query.get_compiler(self.db).execute_sql(CURSOR) File "/usr/local/lib/python3.6/site-packages/django/db/models/sql/compiler.py", line 1149, in execute_sql cursor = super(SQLUpdateCompiler, self).execute_sql(result_type) File "/usr/local/lib/python3.6/site-packages/django/db/models/sql/compiler.py", line 848, in execute_sql cursor.execute(sql, params) File "/usr/local/lib/python3.6/site-packages/django/db/backends/utils.py", line 79, in execute return super(CursorDebugWrapper, self).execute(sql, params) File "/usr/local/lib/python3.6/site-packages/sentry_sdk/integrations/django/init.py", line 500, in execute return real_execute(self, sql, params) File "/usr/local/lib/python3.6/site-packages/django/db/backends/utils.py", line 64, in execute return self.cursor.execute(sql, params) File "/usr/local/lib/python3.6/site-packages/django/db/utils.py", line 95, in exit six.reraise(dj_exc_type, dj_exc_value, traceback) File "/usr/local/lib/python3.6/site-packages/django/utils/six.py", line 685, in reraise raise value.with_traceback(tb) File "/usr/local/lib/python3.6/site-packages/django/db/backends/utils.py", line 64, in execute return self.cursor.execute(sql, params) File "/usr/local/lib/python3.6/site-packages/django/db/backends/mysql/base.py", line 112, in execute return self.cursor.execute(query, args) File "/usr/local/lib/python3.6/site-packages/MySQLdb/cursors.py", line 255, in execute self.errorhandler(self, exc, value) File "/usr/local/lib/python3.6/site-packages/MySQLdb/connections.py", line 50, in defaulterrorhandler raise errorvalue File "/usr/local/lib/python3.6/site-packages/MySQLdb/cursors.py", line 252, in execute res = self._query(query) File "/usr/local/lib/python3.6/site-packages/MySQLdb/cursors.py", line 378, in _query db.query(q) File "/usr/local/lib/python3.6/site-packages/MySQLdb/connections.py", line 280, in query _mysql.connection.query(self, query) django.db.utils.OperationalError: (2006, 'MySQL server has gone away')