ESA-VirES / VirES-Server

VirES for Swarm Server Packages
5 stars 0 forks source link

Unexpected sampling behaviour #242

Open smithara opened 3 days ago

smithara commented 3 days ago

When setting a custom sampling step, the sampling "resets" at the beginning of the next day (file?).

Demonstrated here with MAGx_LR, but first observed by Alexander Grayver using the AUX_OBS products.

from viresclient import SwarmRequest

request = SwarmRequest()
request.set_collection("SW_OPER_MAGA_LR_1B", verbose=False)
request.set_products(
    sampling_step="PT10M"
)
data = request.get_between("2024-04-15T23:41:00", "2024-04-16T00:15:00", asynchronous=False, show_progress=False)
df = data.as_dataframe()
                         Radius   Longitude   Latitude Spacecraft
Timestamp                                                        
2024-04-15 23:41:00  6852707.30  -34.592644   4.318464          A
2024-04-15 23:51:00  6857419.52  -35.118510 -33.871531          A
2024-04-16 00:00:00  6858899.52  -32.532711 -68.117402          A
2024-04-16 00:10:00  6858813.01  129.400782 -73.372255          A

The expected behaviour is that the sampling rate continues uniformly, i.e. 23:51, 00:01, 00:11, ...

pacesm commented 3 days ago

The sampling is currently implemented so that each products is sampled independently. I.e., the new daily product starts at 00:00 and not 00:01 and you can observe this discontinuity.

This is not technically a bug but I understand that it is not what users expect. If it is an issue, I could find a way how to preserve the sampling across the product boundaries (i.e., to carry a time offset from one product to another).

smithara commented 3 days ago

I agree, not exactly a bug and I think I understand the reasoning here. It might instead be worked around in documentation and giving users recipes to get what they want.

If the process were changed, I'm not sure how it could affect non-uniformly sampled datasets.

I wonder if it could be approached with a new alternative process that instead picks data samples at specific times set by the chosen cadence. The number of sample points returned should match the chosen cadence, and gaps filled with NaN, maybe with user-configurable matching behaviour (exact-only/nearest/pick-last/pick-next).