Closed stefaneidelloth closed 2 years ago
Following code works, using regular expression:
df.filter(variable=".*(Total).*", regexp= True)\
.timeseries()
I first did not understand what the doc says:
'regexp=True' disables pseudo-regexp syntax in pattern_match()
I would write instead:
'regexp=True' Enables regular expression syntax.
Thanks for raising this issue, glad that you managed to sort it out yourself.
In addition to the option regexp=True
, which you found, you can also use simply
df.filter(variable="*(Total)*").timeseries()
i.e., *
is a simple wildcard similar to .*
in regexp.
PRs to improve the docs are always welcome.
By the way, we also have a Slack workspace with a helpdesk channel for these kinds of questions.
If one wants to ensure that an expression occurs at a distinct level, following regular expression might be helpful:
df.filter(variable="([^(|)](|)){1}Total.", regexp=True)\ .timeseries()
=>Ensures that "Total" occurs at level 1
Lets assume I have many variables and that some contain the key word "Total" at several positions:
=>How can I filter for all variables that contain a specific substring?
I tried to
a) use a regular expression ".(Total)." (can be checked with https://pythex.org/): that.
df.filter(variable=".*(Total).*")
b) use a filter function.
df.filter(variable = lambda variable: 'Total' in variable)
c) use pandas datframe filtering
df.loc[df['variable'].str.contains('Total')]
However, pyam does not seem to support those options.
Edit: As a workaroun,d I could get the data as pandas dataframe, filter it and wrap it again in a pyam data frame:
If there is a better solution, please let me know.