NCAR / GPEP

GNU General Public License v3.0
9 stars 6 forks source link

Minor correction data_processing #5

Closed DaveCasson closed 8 months ago

DaveCasson commented 1 year ago

This is a minor correction to the code, but can avoid errors.

minRange updated to maxRange a specific if statement

guoqiang-tang commented 10 months ago

Hi Dave, thank you for making those changes. I will look into it in more detail after AGU. Can you share the test case you are using, so I can run it before mering the new CDF transformation? And we also need to add some explanations in documents such as How_to_create_config_files.md regarding the new capabilities. Besides, I notice the changes in requirements.txt. It seems those libraries such as matplotlib are not directly related to GPEP computation. Removing them can help minimize te packages required by GPEP.

andywood commented 10 months ago

I also had a question about changes like:

data[data<=0] = 0 ... Is that significantly different than: data[data<0] = 0 Perhaps ... but if not the original is fine.

As well as a question about how wordy we want the documentation to be -- for instance the following: comments 10 lines + code 5 lines. I tend to prefer a more concise if less formulaic code documentation style, as it's easier to read through (ie, the actual code doesn't get lost in expansive code comments).

Proposed is:

def boxcox_transform(data, texp=4): """ Apply the Box-Cox transformation to the input data.

Parameters:
- data (array-like): Input data to be transformed.
- texp (float): The exponent for the Box-Cox transformation. Default is

4.

Returns:
- ndarray: Transformed data.
"""
if not isinstance(data, np.ndarray):
    data = np.array(data)
data[data<=0] = 0
datat = (data ** (1 / texp) - 1) / (1 / texp)

return data

But a more compact version would be:

""" Apply the Box-Cox transformation to the input data. Arguments:

Also, given that the input arguments for these should be defined in the calling subroutine, do they need to be re-documented here? To avoid code bloat/redundancy, I'd probably leave it out. In which case, make the total comments just the following:

Apply the Box-Cox transformation to the input data

The separate case is where the calling routine just passes a slice of a documented data structure, in which case it's helpful to define somewhere what that passed subset is. But that could also be in the calling subroutine. As long as the documentation appears clearly once, somewhere. Anyway, this is all a coding preference, and doesn't make a difference to functionality.

Andy

On Wed, Dec 6, 2023 at 3:30 PM Guoqiang Tang @.***> wrote:

Hi Dave, thank you for making those changes. I will look into it in more detail after AGU. Can you share the test case you are using, so I can run it before mering the new CDF transformation? And we also need to add some explanations in documents such as How_to_create_config_files.md regarding the new capabilities. Besides, I notice the changes in requirements.txt. It seems those libraries such as matplotlib are not directly related to GPEP computation. Removing them can help minimize te packages required by GPEP.

— Reply to this email directly, view it on GitHub https://github.com/NCAR/GPEP/pull/5#issuecomment-1843643005, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABIKARLNCEQVDKSUNJSU2VLYIDIXVAVCNFSM6AAAAAA6S75TIOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBTGY2DGMBQGU . You are receiving this because you are subscribed to this thread.Message ID: @.***>

DaveCasson commented 10 months ago

I had actually intended to commit the empirical CDF changes to my own fork, but have accidently added them to a smaller PR (minor correction data_processing). I think @guoqiang-tang we can discuss further if this inclusion of emprical cdf is warranted. I will provide the test case. Agree with Andy's comments and can shorten this documentation.

Please leave this PR open for the time being, or close without merging if you prefer, we can decide on merge later.

DaveCasson commented 10 months ago

The additional packages are needed to run the Jupyter notebook examples for the test cases, but as you say not needed for the actual data processing. Will leave to you whether to include.

DaveCasson commented 8 months ago

I am going to close this PR, and submit a cleaner one once the empirical CDF method is better implemented and tested.