GhostManager / Ghostwriter

The SpecterOps project management and reporting engine
https://ghostwriter.wiki
BSD 3-Clause "New" or "Revised" License
1.34k stars 183 forks source link

Error: Encountered an error generating the document: time data `fév. 4, 2022` does not match format `%b. %d, %Y` (french language) #193

Closed cygsecc closed 2 years ago

cygsecc commented 2 years ago

Hello,

I have finished creating my templates two weeks ago and they were working perfectly. But ever since we hit February, I'm getting the error Encountered an error generating the document: time data 'fév. 7, 2022' does not match format '%b. %d, %Y'

I think I understand the bug but I don't know how I can resolve it. The language of the server is french and I think Jinja is expecting 'feb' and not 'fév' in {{ report_date | format_datetime("%b. %d, %Y", "%d/%m/%y") }}. This would explain why it worked during the making of the template in January (January is Janvier in french so %b was equal jan anyway). I already searched the web but I can't find anything related to it.

What do you think ?

Thomas

chrismaddalena commented 2 years ago

Hey @cygsecc, thanks for letting me know about this. I thought setting the locale for the server affected all child Python processes. If you have updated Ghostwriter's server settings to use fr_FR, then Python should have the proper locale. Apparently, that's wrong.

Python defaults to English, so you can reproduce the error like this:

>>> from datetime import datetime
>>> datetime.strptime("fév. 7, 2022", "%b. %d, %Y")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/Cellar/python@3.9/3.9.8/Frameworks/Python.framework/Versions/3.9/lib/python3.9/_strptime.py", line 568, in _strptime_datetime
    tt, fraction, gmtoff_fraction = _strptime(data_string, format)
  File "/usr/local/Cellar/python@3.9/3.9.8/Frameworks/Python.framework/Versions/3.9/lib/python3.9/_strptime.py", line 349, in _strptime
    raise ValueError("time data %r does not match format %r" %
ValueError: time data 'fév. 7, 2022' does not match format '%b. %d, %Y'

It works if you set the locale for the process:

>>> import locale
>>> locale.setlocale(locale.LC_ALL, "fr_FR")
'fr_FR'
>>> datetime.strptime("fév. 7, 2022", "%b. %d, %Y")
datetime.datetime(2022, 2, 7, 0, 0)

I'll look into this. The reporting engine should be able to pull in the locale set for Django and use it, but I'll need to experiment to make sure that's reliable.

chrismaddalena commented 2 years ago

Small update because I had a little time to play with this:

It looks like this is slightly more difficult to resolve with Django. With LANGUAGE_CODE set to fr-FR and Python locale set to fr_FR, datetime.strptime() still cannot recognize the French abbreviation "fév." when run by Django. From my brief research, it seems to be related to django.middleware.locale.LocaleMiddleware. The internet suggests integrating Babel with Django to more easily translate between locales and languages.

I wish it was a simple code update, but it might take a little more work than that. I keep seeing Babel is easy to use, so it might not take me too long to investigate it and see about using it.

chrismaddalena commented 2 years ago

@cygsecc I have a potential workaround for you. I ran into this today when date formatting started outputting the month as March instead of Mar for en-us locales. I have looked into this more and believe I know why the date formatting isn't matching what we might expect. First, the workaround:

  1. Open config/base.py
  2. Find line 45: USE_L10N = True
  3. Change the value to: USE_L10N = False
  4. Add this on a new line: DATE_FORMAT = "b d, Y"

You can change the DATE_FORMAT value to whatever you desire, but "b d, Y" was in your example above. You can find the valid format characters here: https://docs.djangoproject.com/en/4.0/ref/settings/#date-format

And here: https://docs.djangoproject.com/en/4.0/ref/templates/builtins/#std:templatefilter-date

Save the file and restart Ghostwriter.

I think this will work because L1ON enforces a default date format based on your locale. It takes priority over other settings. This can be an issue because it looks like the format string N is used for the month. This isn't a standard Python strftime value. It's only in Django and formats the month to Associated Press style.

Associated Press style is very similar to %b, so I never considered it wasn't %b. The N formatting adds periods after the month abbreviation and does not abbreviate shorter months, like March. It will often match %b., but will spontaneously break with certain months. I'm still not sure why it disliked fév., but once I manually set DATE_FORMATit worked.

I have not done expensive testing but was able to confirm this config workaround with a test case.

Theoretically, we want L1ON enabled for easy localization, but it's only used within HTML templates. In most cases where Ghostwriter displays a date, we have to enforce a specific date format (d M Y) that is easily understood by the table sorter JavaScript. There are 16 places Ghostwriter displays a localized date in the interface and all of them will update with whatever is set for DATE_FORMAT up above.

cygsecc commented 2 years ago

Hello, thank you so much for the extensive response, it's really appreciated. Will definitely try as soon as possible.

I saw on the Slack channel that you were planning to update the roadmap, can't wait to read it. I hope that localization will be high on the roadmap but I would understand if it's not top priority.

chrismaddalena commented 2 years ago

@cygsecc No problem. I merged the changes described above and updated the documentation with notes:

https://www.ghostwriter.wiki/getting-started/installation#notes-on-date_format

Improving localization is a priority. I'll be looking into ways to more easily localize things.

Disabling L10N shouldn't affect localization all that much because there are only a handful of places where it localizes a date right now. With it enabled Django automatically produces dates that match your locale. With this change, you have to set DATE_FORMAT to your preferred date format. It adds a step for a new installation but offers greater control.

To be fair, I don't think I've been using L10N to make use of its full potential for localization. Looking at something like Babel may still be a good idea to avoid confusion with date format strings. There may also be a better way to format dates that can account for Django's proprietary strftime values.

I'm going to close this for now, but please re-open it if disabling L10N and using DATE_FORMAT doesn't work.

er4z0r commented 2 years ago

Thank you for the fix @chrismaddalena. Unfortunately, I am unable to get it working. My settings:

$ grep -i l10n -A 3 config/settings/base.py 
# https://docs.djangoproject.com/en/dev/ref/settings/#use-l10n
USE_L10N = False
DATE_FORMAT = "M. d, Y"
# https://docs.djangoproject.com/en/dev/ref/settings/#use-tz
USE_TZ = True

I have restarted the container, just to be sure, but when I try to re-generate my report I still get the following error:

Encountered an error generating the document: time data März 25, 2022 does not match format %b. %d, %Y

What am I doing wrong?

cygsecc commented 2 years ago

As @er4z0r I also had the time to try it and it does not work either. My settings :

$ grep -i l10n -A 1 config/settings/base.py
# https://docs.djangoproject.com/en/dev/ref/settings/#use-l10n
USE_L10N = False
DATE_FORMAT = "b d, Y"

Error :

Encountered an error generating the document: time data mars 26, 2022 does not match format %b. %d, %Y

(I also restarted the containers docker-compose -f production.yml down && docker-compose -f production.yml up)

chrismaddalena commented 2 years ago

@er4z0r @cygsecc Thanks for checking back in with your results. I'll detail it below, but I think there are two things you need to do:

  1. Run the build command to update production containers
  2. Edit DATE_FORMAT to use M instead of b (mostly for @cygsecc, but read below about Django translations)

If this is a production environment, you will need to re-run the build command to pull the updated configuration files into the containers. The production containers copy the code so changes outside of the container don't affect a production environment until you're ready to incorporate them. If you have not run build that would explain why your added . after the abbreviated month doesn't appear in the actual date output.

There is also Django's translation of date format strings to consider. I documented them on the wiki to help with this, but it's still troublesome. Django uses its own format strings, some of which repurpose Python's. This can make translation difficult.

https://www.ghostwriter.wiki/getting-started/installation#notes-on-date_format

For example, M. d, Y in Django should produce März. 25, 2022 for @er4z0r. Django's M translates to Python's b for an abbreviated textual month with the first letter uppercase–e.g., März.

I assume Django has M because the project repurposed b to output an abbreviated textual month in all lowercase (a strftime value that Python does not have). This conflicts with Python, so Django's output of b d, Y for @cygsecc will not be recognized by Python as matching %b %d, %Y. Basically, Python expects an abbreviated month with an uppercase first character for %b so it sees no match with the lowercase version generated by Django. That's frustrating.

In your report template, if you want an abbreviated month that is lowercase use the lower filter. For example, with DATE_FORMAT set to M. d, Y this would output märz. 25, 2022:

report_date | format_datetime("%b. %d, %Y) | lower

I'm going to experiment with django.utils.dateformat to see if I can leverage that to format dates within a report. I initially thought it would not work outside of a template, but it looks like I was wrong. Switching offers two benefits: the formatter already knows your server's DATE_FORMAT and it accepts Django's format values.

chrismaddalena commented 2 years ago

I looked into it and found a better way to support handling dates within the report templates. Here is the CHANGELOG you'll see soon:

I'll be updating the documentation once it's merged, but the gist is you won't have to provide a Python strftime string anymore. You also want need to translate Django proprietary values to Python values.

Old filter: report_date | format_datetime("%d %b %Y", "%b. %d, %Y") | lower » märz. 25, 2022

New filter: report_date | format_datetime("b d, Y") » märz. 25, 2022

The filter now uses dateutil.parser.parse to convert the string to a datetime object (no need to tell the filter what the current format is) and django.utils.dateformat.format to convert to your new date format (uses Django's values instead of Python's strftime).

The add_days filter has been updated as well, so you can write {{ report_date | add_days(10) }} and get back your date with 10 business days added.

It will also be easier to chain those filters, like so: report_date | add_days(10) | format_datetime("b d, Y")

Unfortunately, this means report templates will have to be updated. The old filters are effectively deprecated. It's very much for the best, but I don't want to just invalidate templates. Not everyone will pay attention to the CHANGELOG file. I need to figure out a good way to handle this in the template linter to warn people.

@er4z0r @cygsecc I appreciate your patience and help with this. The above solution should work for you, but the change should make it much easier. It also supports L10N. I haven't tested that yet, but it simplifies a lot of the original issue.