Open mredaelli opened 3 years ago
One possibility could be to introduce an internal method that gets the indices of the matches, and this could be called in these special use cases (but otherwise would largely be internal). We could also introduce a new flag or method that does a multi-match regex and returns a list of arrow objects. This could require a decently sized refactor though.
I do know that the dateparser package is relatively popular for web scraping. Check out the search_dates
method: https://dateparser.readthedocs.io/en/latest/#dateparser.search.search_dates.
This seems like a highly specialized feature request, so if dateparser does the job, let us know!
Dateparser is what we used before, but ran away from, so I'd much rather have the functionality in arrow :)
Not sure how "nice" it would be, but I'd be more than happy with just an optional parameter of get
, say return_matched_string
, which if True
returns a tuple (date, match object)
or simply (date, matched_string)
, instead of just the date.
But also just the dedicated low-level function would be great (assuming it's still going to be relatively stable :) )
Oh, and I can try my hands with a solution along one of these lines, if you want
Feature Request
Desiderata: have a way to retrieve the part of the string that
arrow
matched to the format in a successful parse.I see at least two use cases.
Get the "rest"
It is not an uncommon need, at least in the domain of web scraping, to extract a date from a string and to store the remaining information from that string somewhere else.
Getting the date with
arrow
is awesomely easy, but once I have that I don't know of a good way to "remove the date I extracted and get the rest of the string", other than formatting the date with all the formats and replacing it in the string.Even that is not bulletproof, though, because
Multiple matches
Suppose I want all the dates that match a certain format in a string? As it is now, I only get the result from the first match.
If I had the information of where the result was matched, I could at least call
get
again on the substring right after the match.