Closed garrettr closed 7 years ago
Going to rebase this on latest master and take it for a spin on the staging machine.
Rebased on top of latest master.
Errored out trying to pull in the new dependency:
TASK [freedomofpress.django-stack : Setup virtualenv from requirements file] ***
task path: /home/conor/freedomofpress/infrastructure/roles-external/freedomofpress.django-stack/tasks/django_service.yml:30
Friday 18 November 2016 15:24:08 -0800 (0:00:17.116) 0:01:31.764 *******
fatal: [securethe.news]: FAILED! => {"changed": false, "cmd": "/home/gcorn/securethenews/bin/pip install -r /var/www/django/securethenews/requirements.txt", "failed": true, "msg": "stdout: Requirement already satisfied (use --upgrade to upgrade): Django>=1.10,<1.11 in /home/gcorn/securethenews/lib/python3.4/site-packages (from -r /var/www/django/securethenews/requirements.txt (line 1))\nRequirement already satisfied (use --upgrade to upgrade): gunicorn==19.6.0 in /home/gcorn/securethenews/lib/python3.4/site-packages (from -r /var/www/django/securethenews/requirements.txt (line 2))\nRequirement already satisfied (use --upgrade to upgrade): Pillow==3.4.2 in /home/gcorn/securethenews/lib/python3.4/site-packages (from -r /var/www/django/securethenews/requirements.txt (line 3))\nRequirement already satisfied (use --upgrade to upgrade): psycopg2==2.6.2 in /home/gcorn/securethenews/lib/python3.4/site-packages (from -r /var/www/django/securethenews/requirements.txt (line 4))\nDownloading/unpacking unicode-slugify==0.1.3 (from -r /var/www/django/securethenews/requirements.txt (line 5))\n Downloading unicode-slugify-0.1.3.tar.gz\n Running setup.py (path:/tmp/pip-build-8jv3iwpg/unicode-slugify/setup.py) egg_info for package unicode-slugify\n Traceback (most recent call last):\n File \"<string>\", line 17, in <module>\n File \"/tmp/pip-build-8jv3iwpg/unicode-slugify/setup.py\", line 7, in <module>\n long_description=open('README.md').read(),\n File \"/home/gcorn/securethenews/lib/python3.4/encodings/ascii.py\", line 26, in decode\n return codecs.ascii_decode(input, self.errors)[0]\n UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 361: ordinal not in range(128)\n Complete output from command python setup.py egg_info:\n Traceback (most recent call last):\n\n File \"<string>\", line 17, in <module>\n\n File \"/tmp/pip-build-8jv3iwpg/unicode-slugify/setup.py\", line 7, in <module>\n\n long_description=open('README.md').read(),\n\n File \"/home/gcorn/securethenews/lib/python3.4/encodings/ascii.py\", line 26, in decode\n\n return codecs.ascii_decode(input, self.errors)[0]\n\nUnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 361: ordinal not in range(128)\n\n----------------------------------------\nCleaning up...\nCommand python setup.py egg_info failed with error code 1 in /tmp/pip-build-8jv3iwpg/unicode-slugify\nStoring debug log for failure in /home/gcorn/.pip/pip.log\n"}
Looks like the setup.py
for the module tries to parse the README for a docstring and that's throwing run-of-mill Python string handling errors. Looks like this reverted commit is relevant (it's the latest in the upstream repo): https://github.com/mozilla/unicode-slugify/commit/02c20066e4af8fc5c929a19301f0d8e4bbba629b
@conorsch Yo dawg I heard you like UnicodeDecodeErrors so I... breaks down in tears
Interestingly, I did not encounter this error when I ran pip3 install unicode-slugify
in my dev VM. There is an open PR for this on the upstream repo (https://github.com/mozilla/unicode-slugify/pull/22), I pinged the maintainer so we'll see if they respond.
In the meantime, maybe there's a potential workaround based on https://github.com/mozilla/unicode-slugify/issues/16#issuecomment-125408979?
Hmm, I was able to run pip install -r requirements.txt
as gcorn
inside /var/www/django/securethenews/
interactively on the server, after sourcing the virtualenv, so looks like this may an issue with the role. Will poke around more.
Re: the "use unicode vs. transliterate unicode" question, it occurs to me that a benefit of transliterating the URLs to ASCII-only is that it makes them easier to type on standard keyboards. I'm not sure if this is actually a benefit or a misapprehension due to my American/native-English bias.
The fact that URLs support a richer character set than ASCII is reason enough to try to support it. After all, in the context of STN, absolutely no one is going to type out an entire URL to browse to the site—they're going to click on it, possibly share it via email. Let's see what we can do to get it Working Right™, before we fall back into the safety of ASCII-only, with our hands over our ears.
Ok, Harris pointed out that as of recent versions of Django, django.utils.text.slugify
takes an allow_unicode
argument, which passes through Unicode correctly. This achieves the same outcome that we had originally achieved with unicode-slugify, but avoids adding an additional dependency (which is causing deployment problems due to https://github.com/mozilla/unicode-slugify/pull/22, and our read of the activity on that repo is that we should not block on them fixing it).
I have tested this locally and confirmed that it is ready to merge.
Two things to remember about this change:
reverse
, use {% url %}
, and use get_absolute_url
. Don't construct URLs by concatenating strings - eventually it will lead to sorrow.Site.name
. In addition, all future loads of sites (using ./manage.py loadsites
, introduced in aeffb81), will generate unicode-aware slugs.@garrettr Great summary. You should post your resolution on the upstream unicode-slugify repo (https://github.com/mozilla/unicode-slugify/pull/22), to aid future searchers in the same vein.
Fixes an issue that we encountered with
tass.ru,ТАСС
fromnews_sites.csv
. Since ТАСС is all non-ASCII characters, Django's built-inslugify
generated an empty slug for it. This issue came to light due to #47, manifesting as a 500 error, becauseSite.get_absolute_url
for ТАСС tried to callreverse
with an empty slug, which obviously doesn't work.This PR:
slug != ''
before saving Sites to prevent this bug from popping up again in the future.What do the Unicode URLs look like?
At the moment, the slugs will contain Unicode characters. Note that URLs are restricted to ASCII characters according to the spec; however, it is common practice to percent-encode Unicode characters for use in URLs, and all modern browsers handle this correctly. Here's what it looks like:
If you copy-paste the URL, you will see it is actually percent-encoded "behind the scenes":
http://localhost:8000/sites/%D1%82%D0%B0%D1%81%D1%81
.Potential alternative
The other alternative is to transliterate non-ASCII characters to ASCII in the manner of unidecode. This can be easily achieved with
django-unicode
by passingonly_ascii=True
toslugify
.I am not sure which solution is best, but I do think we should make a final decision about this before the site is launched - changing this after launch will cause URLs to break, and I'd prefer to avoid dealing with setting up redirects for these pages if I can.