Forever-Young / mrab-regex-hg

Automatically exported from code.google.com/p/mrab-regex-hg
0 stars 0 forks source link

Enhance API of captures() to enable retrieval of ALL groups at once, as a dictionary #86

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Hi,
For non-repeated groups, one can use match.groupdict() to retrieve a dictionary 
of ALL groups and their values, including un-matched groups. But there is no 
equivalent for repeated groups: match.captures() only returns values for groups 
given explicitly in arguments, while groupdict() doesn't include multiple 
values.

I suggest either:

1. Change API of captures() so that captures() (no args) returns a dictionary 
of ALL groups, not just group 0 - this would be the most convenient and 
intuitive, but would break existing code if somebody relies on this feature.

2. Add a boolean argument to captures(), say "all", equal False by default, to 
let the client indicate that a full dictionary is expected.

3. Add new method, say capturesdict() to return dict of all groups.

Thanks
Marcin

What version of the product are you using? On what operating system?

0.1.20130120
Linux, Python 2.7.2

Original issue reported on code.google.com by mwojn...@gmail.com on 23 Jan 2013 at 6:16

GoogleCodeExporter commented 9 years ago
Should the dict behave like this?

capturesdict = {}
for name in m.groupdict().keys():
    capturesdict[name] = m.captures(name)

What's your usecase? Could you provide some examples of the suggested feature?

Original comment by re...@mrabarnett.plus.com on 23 Jan 2013 at 6:57

GoogleCodeExporter commented 9 years ago
Yes, it should behave in this way.
Usecase: web scraping, extraction of many different values from a complex html 
page in one go (for example, profile page of a product, with different 
properties listed in a fixed layout) - after applying a regex the next step is 
to take *all* extracted data as a dict, not one by one.

Original comment by mwojn...@gmail.com on 23 Jan 2013 at 11:40

GoogleCodeExporter commented 9 years ago
Could you provide some simple test cases?

I think it'll be called 'capturesdict'.

Original comment by re...@mrabarnett.plus.com on 24 Jan 2013 at 2:01

GoogleCodeExporter commented 9 years ago
I've added a 'capturesdict' method to match objects in regex 0.1.20130124.

Original comment by re...@mrabarnett.plus.com on 24 Jan 2013 at 8:30

GoogleCodeExporter commented 9 years ago
Great, thanks for all the changes and for very useful library.

Original comment by mwojn...@gmail.com on 25 Jan 2013 at 10:59