arrow-py / arrow

🏹 Better dates & times for Python
https://arrow.readthedocs.io
Apache License 2.0
8.69k stars 670 forks source link

Support for capitalisation in date formatting #818

Open virresh opened 4 years ago

virresh commented 4 years ago

Feature Request

In glibc strftime, we have the ^ operator which provides this functionality. However, it's not cross-platform, and surprisingly there exists no clean way to uppercase alphabets in Windows. I have faced this issue and am looking for an alternative that will allow me to do this in a platform-agnostic way. It is supported out of the box by Java's String.format().

PS: I can't just do .to_upper() to the string

A detailed example for such use case:

I need to generate get requests to a server, one day after another. The url paths on that server are case sensitive and I have multiple websites on which I need to do this testing. So essentially I need someting like:

requests.get('www.example.com/JUN/06/2020/file.csv')
requests.get('www.example.com/JUL/06/2020/file.csv')
requests.get('www.example.com/AUG/06/2020/file.csv')

On glibc, I can simply do:

date.format('www.example.com/%^b/%d/%Y/file.csv')

and get the desired file because it supports capitalisation. However, there is no clean alternative to do the same in windows. I'm hoping Arrow would bridge the gap.

In, fact, if feasible, it'll be a great addition if arrow can support all glibc extensions.

Pardon me if arrow already supports this, but I couldn't figure it out from the documentation.

jadchaar commented 4 years ago

Thanks for the feature request @virresh. Until support for this is added, would the following solution work for your use case?

env ❯ python3
Python 3.7.7 (default, Mar 10 2020, 15:43:33)
[Clang 11.0.0 (clang-1100.0.33.17)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import arrow
>>> formatted_date = arrow.now().format("MMM/DD/YYYY").upper()
>>> formatted_date
'JUL/05/2020'
>>> final_str = "www.example.com/{}/file.csv".format(formatted_date)
>>> final_str
'www.example.com/JUL/05/2020/file.csv'
virresh commented 4 years ago

Thanks for the prompt response!
Unfortunately, that wouldn't be sufficient. The server url is configurable entity, so I cannot be sure that the date needs to be inserted between "www.example.com/" and "/file.csv".

There are cases such as:

www1.example.com/JUN/files/2020/downloadable.csv

My current workaround is to write a custom regex-based parser with if-else for the desired format strings (year, month and day family)

I'm not sure if arrow would need the same, but if it does, I'll be happy to send that in as a PR.

jadchaar commented 4 years ago

If you would like to craft a PR, that would be much appreciated. I think an implementation that adds a flag to arrow.format() may be best. A solution that allows users to specify which token they want capitalized may be best, but we can maybe start with a flag that capitalizes the entire format string.

systemcatch commented 4 years ago

So this would affect the MMM, MMMM, DDD and DDDD tokens (also A/a?).

Without seeing your code it's hard to know how to solve the second problem. You could try breaking the date components apart for flexibility but maybe that doesn't help much.

import arrow
dt=arrow.utcnow()
day, month, year = dt.format("DD"), dt.format("MM").upper(), dt.format("YYYY")
virresh commented 4 years ago

So this would affect the MMM, MMMM, DDD and DDDD tokens (also A/a?).

Yes, these are the ones I can think of.

Without seeing your code it's hard to know how to solve the second problem

I don't think that the regex solution would be of any help since most likely a custom python format string specifier would be required (not sure, guessing only). But using Glibc extension in linux systems, I can provide an idea of how that should look: (the pre-defined templates will need to be changed when using arrow, but general idea of workflow would be similar)

import datetime

def parse_url(template, date):
    # this is the function that needs to be cross-platform, and hopefully, arrow would work here
    return date.strftime(template)

d = datetime.datetime.now()

template1 = 'www.example.com/%^b/%d/%Y/file.csv'
template2 = 'www1.example.com/%^b/files/%Y/downloadable.csv'
notalwayscapital = 'www2.example.com/%b/overview/%Y/anything.csv'

print(parse_url(template1, d))
print(parse_url(template2, d))
print(parse_url(notalwayscapital, d))

This gives following output on my Ubuntu:

>>> print(parse_url(template1, d))
www.example.com/JUL/08/2020/file.csv
>>> print(parse_url(template2, d))
www1.example.com/JUL/files/2020/downloadable.csv
>>> print(parse_url(notalwayscapital, d))
www2.example.com/Jul/overview/2020/anything.csv

Separating into components and uppercasing is what I'm using right now, but in order to do that, I need to use a lot of boilerplate code which will parse the string and determine where to uppercase and where not to. Imo a custom format specifier based solution would be more clear and to the point, but I haven't explored it yet.

virresh commented 4 years ago

Update: I fiddled around with custom format strings, and turns out a wrapper is much much more convenient for my purposes.

Here is a snippet of my wrapper:

import datetime

class dWrapper:
    def __init__(self, date):
        self.date = date
    def __format__(self, spec):
        caps = False
        if '^' in spec:
            caps = True
            spec = spec.replace('^', '')
        out = self.date.strftime(spec)
        if caps:
            out = out.upper()
        return out
    def __getattr__(self, key):
        return getattr(self.date, key)

def parse_url(s, d):
    return s.format(dWrapper(d))

template1 = 'www.example.com/{0:%^b}/{0:%d}/{0:%Y}/file.csv'
template2 = 'www1.example.com/{0:%^b}/files/{0:%Y}/downloadable.csv'
notalwayscapital = 'www2.example.com/{0:%b}/overview/{0:%Y}/anything.csv'

d = datetime.datetime.now()

print(parse_url(template1, d))
print(parse_url(template2, d))
print(parse_url(notalwayscapital, d))

(Although it's Python 3+ only)

systemcatch commented 4 years ago

@virresh nice that you've found a solution.

As a bare minimum we should update the docs to mention that things like AUGUST/AUG will parse using MMMM/MMM. At the moment that's not clear.

jadchaar commented 4 years ago

@virresh nice that you've found a solution.

As a bare minimum we should update the docs to mention that things like AUGUST/AUG will parse using MMMM/MMM. At the moment that's not clear.

Yeah I think we use re.ignorecase in most regex statements involved with parsing.

bphermansson commented 2 years ago

@virresh nice that you've found a solution.

As a bare minimum we should update the docs to mention that things like AUGUST/AUG will parse using MMMM/MMM. At the moment that's not clear.

Is this still a problem? If so, can you explain your answer a bit more? To have the date in uppercase you have to use .upper()?