ASKBOT / askbot-devel

Askbot is a Django/Python Q&A forum. **Contributors README**: https://github.com/ASKBOT/askbot-devel#how-to-contribute. Commercial hosting of Askbot and support are available at https://askbot.com
Other
1.56k stars 627 forks source link

Unable to import osqa content by using the command `askbot_add_osqa_content` #913

Closed samip5 closed 1 year ago

samip5 commented 1 year ago

It seems that in the current master, git commit 57afe9a2fd05d31094b005c87bc0cafbb207677c the osqa command doesn't work?

What was the last version this has been tested to be working?


$ python manage.py askbot_add_osqa_content
b"WARNING!!! You are using a 'locmem' (local memory) caching backend,\nwhich is OK for a low volume site running on a single-process server.\nFor a multi-process configuration it is neccessary to have a production\ncache system, such as redis or memcached.\n\nWith local memory caching and multi-process setup you might intermittently\nsee outdated content on your site.\n"
Traceback (most recent call last):
  File "/var/www/servicesites/ukk.<snip>.fi/askbot/askbot_site/manage.py", line 10, in <module>
    execute_from_command_line(sys.argv)
  File "/var/www/servicesites/ukk.<snip>.fi/askbot/env/lib/python3.9/site-packages/django/core/management/__init__.py", line 419, in execute_from_command_line
    utility.execute()
  File "/var/www/servicesites/ukk.<snip>.fi/askbot/env/lib/python3.9/site-packages/django/core/management/__init__.py", line 413, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/var/www/servicesites/ukk.<snip>.fi/askbot/env/lib/python3.9/site-packages/django/core/management/base.py", line 354, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/var/www/servicesites/ukk.<snip>.fi/askbot/env/lib/python3.9/site-packages/django/core/management/base.py", line 398, in execute
    output = self.handle(*args, **options)
  File "/var/www/servicesites/ukk.<snip>.fi/askbot/env/lib/python3.9/site-packages/askbot/management/commands/askbot_add_osqa_content.py", line 107, in handle
    self.redirect_format = self.get_redirect_format(options['redirect_format'])
KeyError: 'redirect_format'
evgenyfadeev commented 1 year ago

Would you send me your osqa dump to support@askbot.com? I'll fix the import command. How big is the file?

samip5 commented 1 year ago

Would you send me your osqa dump to support@askbot.com? I'll fix the import command. How big is the file?

Due to lack of docs around that, I did not know it requires an file as it doesn't yell at me for not having one. It probably should if that's the case.

It would be even more useful if it was able to gather the data directly from the osqa database.

evgenyfadeev commented 1 year ago

It reads an XML file. Check out output of python manage.py askbot_add_osqa_content --help. That said, this command was not run recently and might have issues (there is a chance though that it's working).

samip5 commented 1 year ago

It reads an XML file. Check out output of python manage.py askbot_add_osqa_content --help. That said, this command was not run recently and might have issues (there is a chance though that it's working).

I did not get an XML file, but rather json file from osqa and it's 1.6GB.

evgenyfadeev commented 1 year ago

Is that what's produced by the osqa's python manage.py dumpdata ? I'm sure it used to be an XML file. Does it have an option to output in XML format? How did you obtain this json file?

samip5 commented 1 year ago

Is that what's produced by the osqa's python manage.py dumpdata ? I'm sure it used to be an XML file. Does it have an option to output in XML format? How did you obtain this json file?

I simply executed that python manage.py dumpdata which results in JSON, not XML.

evgenyfadeev commented 1 year ago

Could you try this python manage.py dumpdata --format=xml > output.xml ?

samip5 commented 1 year ago

Could you try this python manage.py dumpdata --format=xml > output.xml ?

That seems to produce xml, but that's 1.9GB :)

samip5 commented 1 year ago

Next question is that what argument would I need to give for it to understand my file?

$ python manage.py askbot_add_osqa_content data.xml
usage: manage.py askbot_add_osqa_content [-h] [--version] [-v {0,1,2,3}] [--settings SETTINGS] [--pythonpath PYTHONPATH] [--traceback] [--no-color] [--force-color]
manage.py askbot_add_osqa_content: error: unrecognized arguments: data.xml
samip5 commented 1 year ago

Hi, @evgenyfadeev.

Unfortunately it seems that the command is not parsing arguments properly so when has this worked previously?

evgenyfadeev commented 1 year ago

It's true, the command has not been updated when the Django versions were updated. I'll look into this now.

evgenyfadeev commented 1 year ago

@samip5 I've made a commit to the master branch that fixes the argument issue. The command might still break and if you open the code for it - you'll see that some entities from OSQA are not imported, the most important ones are though.

samip5 commented 1 year ago

@samip5 I've made a commit to the master branch that fixes the argument issue. The command might still break and if you open the code for it - you'll see that some entities from OSQA are not imported, the most important ones are though.

I have now tried that, but it seems that the time data parsing is failing? Also if that fails, will that also prevent any import as it was eating the file for some time before giving that error out, like eating 20-30GB OF RAM?

MAX RAM consumed was: image

And after it went back to 24.3G.

Traceback (most recent call last):
  File "/var/www/servicesites/ukk.<snip>.fi/askbot_site/env/lib/python3.9/site-packages/askbot/management/commands/askbot_add_osqa_content.py", line 28, in decode_datetime
    return datetime.strptime(data, '%Y-%m-%d %H:%M:%S')
  File "/usr/lib/python3.9/_strptime.py", line 568, in _strptime_datetime
    tt, fraction, gmtoff_fraction = _strptime(data_string, format)
  File "/usr/lib/python3.9/_strptime.py", line 349, in _strptime
    raise ValueError("time data %r does not match format %r" %
ValueError: time data '2011-10-08T15:18:42' does not match format '%Y-%m-%d %H:%M:%S'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/www/servicesites/ukk.<snip>.fi/askbot_site/manage.py", line 10, in <module>
    execute_from_command_line(sys.argv)
  File "/var/www/servicesites/ukk.<snip>.fi/askbot_site/env/lib/python3.9/site-packages/django/core/management/__init__.py", line 419, in execute_from_command_line
    utility.execute()
  File "/var/www/servicesites/ukk.<snip>.fi/askbot_site/env/lib/python3.9/site-packages/django/core/management/__init__.py", line 413, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/var/www/servicesites/ukk.<snip>.fi/askbot_site/env/lib/python3.9/site-packages/django/core/management/base.py", line 354, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/var/www/servicesites/ukk.<snip>.fi/askbot_site/env/lib/python3.9/site-packages/django/core/management/base.py", line 398, in execute
    output = self.handle(*args, **options)
  File "/var/www/servicesites/ukk.<snip>.fi/askbot_site/env/lib/python3.9/site-packages/askbot/management/commands/askbot_add_osqa_content.py", line 106, in handle
    self.import_users()
  File "/var/www/servicesites/ukk.<snip>.fi/askbot_site/env/lib/python3.9/site-packages/askbot/management/commands/askbot_add_osqa_content.py", line 170, in import_users
    self.copy_numeric_parameter(from_user, to_user, 'last_login', operator='max')
  File "/var/www/servicesites/ukk.<snip>.fi/askbot_site/env/lib/python3.9/site-packages/askbot/management/commands/base.py", line 226, in copy_numeric_parameter
    from_par = getattr(from_obj, from_param_name)
  File "/var/www/servicesites/ukk.<snip>.fi/askbot_site/env/lib/python3.9/site-packages/askbot/management/commands/askbot_add_osqa_content.py", line 86, in __getattr__
    value = self.decode_value(key)
  File "/var/www/servicesites/ukk.<snip>.fi/askbot_site/env/lib/python3.9/site-packages/askbot/management/commands/askbot_add_osqa_content.py", line 76, in decode_value
    return self.decode_typed_value(field)
  File "/var/www/servicesites/ukk.<snip>.fi/askbot_site/env/lib/python3.9/site-packages/askbot/management/commands/askbot_add_osqa_content.py", line 50, in decode_typed_value
    return decode_datetime(value)
  File "/var/www/servicesites/ukk.<snip>.fi/askbot_site/env/lib/python3.9/site-packages/askbot/management/commands/askbot_add_osqa_content.py", line 30, in decode_datetime
    return datetime.strptime(data, '%Y-%m-%d')
  File "/usr/lib/python3.9/_strptime.py", line 568, in _strptime_datetime
    tt, fraction, gmtoff_fraction = _strptime(data_string, format)
  File "/usr/lib/python3.9/_strptime.py", line 352, in _strptime
    raise ValueError("unconverted data remains: %s" %
ValueError: unconverted data remains: T15:18:42
evgenyfadeev commented 1 year ago

I've fixed the datetime format. It's possible that the process will be memory heavy.

samip5 commented 1 year ago

Hmm, my file seems to have weird timestamps as unconverted data remains: T22:01:08.561227 but should that prevent the whole import of succeeding?

It doesn't say that it succeeded, nor does it say anything about questions or content being created so I think it failed completely?

evgenyfadeev commented 1 year ago

Give it a shot now. Btw, perhaps you might yourself try to debug if you can work with Python?

If it succeeds with this error, some dates will be incorrect.

samip5 commented 1 year ago

Give it a shot now. Btw, perhaps you might yourself try to debug if you can work with Python?

If it succeeds with this error, some dates will be incorrect.

I know my way around Python yes, and it seems that try statement with datetime decoding needs to handle all three cases which currently it doesn't do as the first ValueError returns to %Y-%m-%dT%H:%M:%S but if that's incorrect it doesn't go to the last one which would be correct for date of births. Got good ideas how that try/except should be structured?

ValueError: time data '1983-05-20' does not match format '%Y-%m-%dT%H:%M:%S'

evgenyfadeev commented 1 year ago

Right, there was a bug in the try/except block. Now it handles three datetime formats.

samip5 commented 1 year ago

I don't know if this is a bug, but should removing users result in 'UserProfile' object has no attribute 'id' error via Django admin?

Or is this just due to me using the master branch atm?

samip5 commented 1 year ago

@evgenyfadeev Could you please fix the issue of not able to delete users from django admin ui? I'm failing to understand the relationships with the tables to do that myself.