gecos-lab / PZero

GNU Affero General Public License v3.0
24 stars 2 forks source link

pandas df.query() #72

Open andrea-bistacchi opened 3 months ago

andrea-bistacchi commented 3 months ago

Dataframe queries can be implemented as:

query_string = 'something'
df.query(query_string)

This is very effective since it is possible to compare two fields:

query_string = 'field_1 == field_2'
df.query(query_string)

a field and a string:

query_string = 'field_1 == "some string"'
df.query(query_string)

#or

string = "some string"
query_string = f'field_1 == "{string}"'
df.query(query_string)

a field and a value:

query_string = 'field_1 >= 10.3'
df.query(query_string)

#or

value = 10.3
query_string = f'field_1 == {value}'
df.query(query_string)

and to get the whole dataframe it is possible to define a string that will be true for all rows:

query_string = 'index == index'
df.query(query_string)

(somebody says 'ilevel_0 in ilevel_0' is more robust).

andrea-bistacchi commented 3 months ago

However this method is prone to key errors.

(A) For some reason the "@" method reported in many examples does not work always, so use the syntax above:

query_string = 'field_1 == "some string"'
df.query('@query_string')

(B) Unfortunately if the field that is used in the query is not present, pandas crashes instead of returning an empty dataframe. To deal with this we have three options:

1) all columns used for common queries must be present in all collection dataframes, but (i) this can cause some redundancy, e.g. when adding the x_section column to the x section collection, and (ii) this might mean adding a lot of these redundant fields in the future, with problems of backwards compatibility of project files;

2) handle errors where a field is not present in a dataframe with try: except: return; this is not nice but should work;

3) find a way to obtain an empty list from a query where a field is not present in a dataframe, instead of a fatal error

Opinions?

andrea-bistacchi commented 3 months ago

It seems it works for cross sections. See last commit in windows_refactoring branch.

Must be tested on different objects that are not included in standard test projects.

gbene commented 3 months ago

It looks like it works fine for wells too