NOAA-EMC / WW3-tools

19 stars 14 forks source link

Remove downsampling of data for scatterplot #62

Closed Ghazal-Mohammadpour closed 4 months ago

Ghazal-Mohammadpour commented 4 months ago

Description

This PR removes the downsampling of data in the scatterplot function within the ModelObsPlot class. Previously, the code subsampled the data if the number of points in self.obs exceeded a certain threshold. However, this downsampling has been removed to retain all data points for plotting.

Changes Made

Removed the conditional statement that used downsampling based on the number of points in self.obs.

Impact

Keeping all data points ensures that the scatterplot accurately represents the entire dataset without subsampling.

Test

Tests should also be run to confirm that the functionality of the scatterplot method is not affected by the removal of downsampling.

Notes

Users should be informed of the potential increase in plotting time when working with large datasets.

JessicaMeixner-NOAA commented 4 months ago

@ricampos I'm not sure the original intent of the downsampling was. If you would like to maintain this feature, an option can be added instead to down-sample, or not.

ricampos commented 4 months ago

The downsampling is important only when there is a huge amount of matchups and the time to generate the plot becomes unacceptable. I noticed this happened for model/sat collocations where the amount of data is enormous. It is worth checking how metpy or emcpy is handling this issue. Perhaps, instead of removing it, it could be set as optional (with default = 'no'). So in case one user is taking too long to have the plot done, then he/she could reduce the data size and generate the plot quickly.

JessicaMeixner-NOAA commented 4 months ago

@ricampos thanks for helping us understand the original intent behind the down sampling. Having that as an option instead of removing sounds like a good solution to me.

ricampos commented 4 months ago

Hi @Ghazal-Mohammadpour thanks for working on this and also for updating to PEP 8 Style Guide (this is something we will have to do for the entire repo). In terms of the downscaling (using sk) in scatterplot, could you please set it as optional (with default = 'no') ? for example if self.obs[0,:].shape[0]>50000 and dwscl != 'no': and the function would have the default dwscl = 'no'