atcollab / at

Accelerator Toolbox
Apache License 2.0
48 stars 31 forks source link

regular expression instead of fnmatch? #598

Closed ZeusMarti closed 1 year ago

ZeusMarti commented 1 year ago

Hi all,

'fnmatch' is used in at.lattice.utils.checkname and at.get_refpts and maybe others. To describe in one pattern elements with diferent patterns the regular expresions 're' is more convinient (or 'regex', I'm no expert). For example, orbit steerers in our case are both sharing sextupole yoke ('SV?') and as stand alone correctors ('COR'). Using 're' this can all be targeted at one: 'SV?|COR', that is not the case with 'fnmatch'. I'm not sure how, but maybe one could add 're' as an additional interpreter so that no back compatibility is lost?

Best,

simoneliuzzo commented 1 year ago

@ZeusMarti

I fully support this request. I also had this problem several times, for several lattices. Moreover, matlab AT uses regular expression.

I am however unable to provide myself a fix to this issue.

swhite2401 commented 1 year ago

This is possible, there exist functions that accept regex in python. However this would not be backward compatible since we would change the syntax.

This could nevertheless be set as an optional feature, a general switch could even be provided as a single entry point to change all functions. Would that be acceptable for you?

simoneliuzzo commented 1 year ago

A general switch would be good for me.

What about allowing both regex and fn at the same time? Do they conflict? We could match the string with fn, then with regex and return the union of the two results. I think fn gives always a subset of regex.

For the example by @ZeusMarti ('SV?|COR') fn would return nothing, while regex would give the correct indexes.

ZeusMarti commented 1 year ago

Sounds good to me too.

Thanks Simon*

lfarv commented 1 year ago

Good suggestion. But we must take care of 2 points:

I can prepare something like that.

swhite2401 commented 1 year ago

I already started a branch in #601 , open for discussion this is just a minimalist proposal. The local switch is very inconvenient in my opinion, the only function in AT assuming fnmatch is fastring but with an exact match so it is compatible with both re.match() and fnmatch()

The local switch is very inconvenient in my opinion, it requires to modify many functions and forces the users to provide the flag every time a string matching function is used, one the other hand the global switch is similar to an AT configuration: you set it and then forget about it.

@simoneliuzzo , @ZeusMarti any opinion on this?

lfarv commented 1 year ago

@swhite2401 : It's really a problem: for me, I will continue to use fnmatch when possible: it's faster and the syntax is simpler! But switching globally make my code unusable !

AN optional switch in checkname (defaulting to False <=> fnmatch) leaves all existing code unchanged: 100% backward compatible. The flag only needs to be specified once when RE is needed !

swhite2401 commented 1 year ago

Then we should provide both to make everyone happy. The local version is not complicated but is a bit cumbersome since there are many functions involved, I can integrate it in #601

lfarv commented 1 year ago

Let me put it differently:

Many people have been using checkname (get_cells) up to now (ex. @ZeusMarti , @simoneliuzzo…) since it's a simple way of selecting elements. With a global flag,

  1. You prevent them to use the new regular expression feature for ever (unless they modify their existing code),
  2. You prevent them for inserting any piece of code coming from another source.

This is a strong restriction !

More generally, the simple presence of the global option prevents from merging code from different sources.

In my opinion this is not acceptable, I don't understand how people so anxious about backward compatibility would accept that. In addition, troubleshooting the consequences of simply switching the global flag is rather difficult, you don't know were to look!

Using regular expression is a welcome new possibility, adding ..., re=True, ... in an argument list when needed is not a constraint. And it's a matter of taste: coming back to the example of @ZeusMarti, writing:

selected = ring.get_cells("SV?") | ring.get_cells('COR')

is faster (and easier for non-experts in REs) than:

selected = ring.get_cells("'SV.|COR", re=True)

@swhite2401:

the local version is not complicated but is a bit cumbersome since there are many functions involved

True. Only checkname must process the keyword. Most other functions calling it and needing to forward the keyword are in the same utils.py module: get_uint32_refpts, get_bool_refpts, refpts_iterator, _refcount, get_refpts, get_elements, get_value_refpts, set_value_refpts, get_s_pos.

There are some indirect calls outside, but less critical. I found for instance fit_tune and fit_chrom in globalfit.py which might be interesting.

swhite2401 commented 1 year ago

@lfarv honestly I am not convinced by your arguments. Having both local and global flags covers all cases without restrictions, your arguments don't hold in this case. We can even think of adding a warning when the global flag is activated by setting its value to True.

Anyway, I personnally don't care as I don't use RE, so let's see what @simoneliuzzo and @ZeusMarti think, if they think a global flag is useless or inadequate it is easy to remove. Meanwhile I will start the implementation of the local flag as it is needed whatever is decided for the global flag.

A careful review will be needed to make sure all functions are correctly modified.

lfarv commented 1 year ago

@swhite2401

Let's assume someone uses the global flag: he uses regular expressions without putting the local flag "re=True". Then:

In other words: don't do that. So why proposing it ?

If the global flag exists, how can a user know how that he has to turn the flag in one way tu run program A, and then turn it the other way to run program B?

If the global flag is not in the expected position, the program may run normally, but give crazy results, just because selecting the wrong elements… This is far too dangerous.

swhite2401 commented 1 year ago

Well noted, and I agree. But this all depends on the use you intent to make, there are safe usage to it. It is not our role to make suppositions on how one would use a feature he requested but rather to advise and warn of potential issues (which you have done very clearly) and make sure it does not arm other users or AT default behavior (which is the case).

As I already said, in this specific case I am really neutral because I will not use it, but if there is a need (potentially for very good reasons that you haven't though of) this development will follow a democratic process as it should in a collaborative project.

If it can makes you feel better, I will remove it from this proposal waiting for @simoneliuzzo and @ZeusMarti opinion because they initiated this request, I think they are now well aware of the limits.

simoneliuzzo commented 1 year ago

Dear @swhite2401 and @lfarv ,

for me the local flag would work. I add the argument ...,re=True,... and I can use the same search strings as in matlab. For me get_cells and get_refpts should definitely have this flag as they are the user-friendly way to get indexes and quickly interact with the lattice.

ZeusMarti commented 1 year ago

Thanks a lot @swhite2401, @lfarv and @simoneliuzzo, the local solution sounds good to me too.

swhite2401 commented 1 year ago

Covered in #601