Sage-Bionetworks / synapsePythonClient

Programmatic interface to Synapse services for Python
https://www.synapse.org
Apache License 2.0
65 stars 67 forks source link

[SYNPY-1447] Update `fillna` method to work directly off original df #1113

Closed jaymedina closed 1 week ago

jaymedina commented 2 weeks ago

problem

User received a FutureWarning from pandas v2.2.1 notifying them of a future syntax issue in the way dataframe columns are updated in the python client's table.py script.

The message:

FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.

solution

Improve the logic in table.py that replaces NaNs or NA values for a given column col with empty lists ("[]") by changing the syntax to have fillna work with the direct view of the dataframe, instead of a column copy. This will remove the FutureWarning message, and prevent errors related to this logic when we eventually upgrade our pandas dependency to version 3.0.

testing & preview

Before the syntax update:

In [1]: import synapseclient

In [2]: syn = synapseclient.login()
Welcome, Jenny Medina!

In [3]: table = syn.tableQuery("SELECT * FROM syn52955244").asDataFrame()
 [####################]100.00%   1/1   Done...

Downloading  [####################]100.00%   283.0bytes/283.0bytes (5.4MB/s) SYNAPSE_TABLE_QUERY_143601110.csv Done...

[WARNING] [//REDACTED\\] FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.

  df[col].fillna("[]", inplace=True)

In [4]: table

Out[4]: 
    firstName lastName institution  username             challengeRole
1_4   Awesome    User1        Sage   3401292  [organizer, contributor]
2_4   Awesome    User2        Sage   3401292                 [sponsor]
6_8   Awesome    User3        Sage   3401292                 [support]

In [5]: 

After the syntax update:

In [1]: import synapseclient

In [2]: syn = synapseclient.login()
Welcome, Jenny Medina!

In [3]: table = syn.tableQuery("SELECT * FROM syn52955244").asDataFrame()

In [4]: table

Out[4]: 
    firstName lastName institution  username             challengeRole
1_4   Awesome    User1        Sage   3401292  [organizer, contributor]
2_4   Awesome    User2        Sage   3401292                 [sponsor]
6_8   Awesome    User3        Sage   3401292                 [support]

The new syntax does not affect the expected behavior:

image