SciQLop / speasy

Space Physics made EASY! A simple Python package to deal with main Space Physics WebServices (CDA,SSC,AMDA,..)
Other
24 stars 7 forks source link

Loading E burst on MMS #131

Open Nfargette opened 2 months ago

Nfargette commented 2 months ago

Description

I am working with MMS data and I need to load the burst electric field for some dedicated periods of time (typically one burst period of a few minutes). I use the CDA repository to get the data with speasy.

I have two problems : 1 - loading time On this repository (https://cdaweb.gsfc.nasa.gov/pub/data/mms/mms1/edp/brst/l2/dce/2016/11/), an E_brst file is typically 40 Mb and takes me 5 seconds to download locally.

However, using speasy, it takes at least 10 to 40 minutes (depending on which time period I am looking at) to download E busrt for the 4 spacecraft of MMS. Is there anyway to reduce this downloading time ?

2 - problem loading an event the loading of E brst sometimes does not work for a specific event. I include the loading code and the error associated below.

What I Did

import speasy from datetime import datetime from dotmap import DotMap from speasy.products import SpeasyVariable from pandas import Timestamp, to_datetime, to_timedelta

def date_to_central_time(date, tc) : return (date - Timestamp(tc).to_datetime64()).astype(float) * 1e-9

def format_data(folder_name, tb, tc, te, N) : """ Read the data located in folder_name using Speasy and format them.

Parameters
----------
folder_name : str
    Name of the folder containing the data. The satellite numbers should be 
    replaced by '?'.
tb, tc, te : float
    Start, central and end time for reading data.
N : int
    Number of spacecrafts.

Returns
-------
yn : list of array_like, shape (D, In)
    Data values.
tn : list of array_like, shape (In, )
    Time values.
"""

# Load the data from Speasy
data: SpeasyVariable = speasy.get_data(
    [folder_name.replace('?', str(n)) for n in range(1, N+1)], tb, te
)  

yn = [d.values.T[:3] for d in data]  # Return the data values 
tn = [date_to_central_time(d.time, tc) for d in data] # and time lists for all spacecrafts

return yn, tn

N = 4 #number of spacecraft tc = datetime(2016, 11, 28, 7, 36, 55, 450000) #central time tb =datetime(2016, 11, 28, 7, 36, 52, 950000) # Beginning and te = datetime(2016, 11, 28, 7, 36, 57, 950000) # Ending time of the plot

"""

there is no problem for those dates :

tc = datetime(2015, 10, 16, 13, 7, 2, 200000) #central time tb =datetime(2015, 10, 16, 13, 6, 59, 700000) # Beginning and te = datetime(2015, 10, 16, 13, 7, 4, 700000) # Ending time of the plot """

tbce = (tb, tc, te)

var = DotMap() # Dictionary to store all variables

print("Loading E burst...") var.E.yn, var.E.tn = format_data('cda/MMS?_EDP_BRST_L2_DCE/mms?_edp_dce_gse_brst_l2', *tbce, N) # E (mV/m) print("Data Loaded.")


It ran for 10 minutes before displaying :


Loading E burst... Can't get data from proxy server http://sciqlop.lpp.polytechnique.fr/cache

TypeError Traceback (most recent call last) c:\Users\nfargett.vscode\ipython\code.py in 1 print("Loading E burst...") ----> 2 var.E.yn, var.E.tn = format_data('cda/MMS?_EDP_BRST_L2_DCE/mms?_edp_dce_gse_brst_l2', *tbce, N) # E (mV/m) 3 print("Data Loaded.")

c:\Users\nfargett.vscode\ipython\code.py in format_data(folder_name, tb, tc, te, N)

~\Anaconda3\lib\site-packages\speasy\core\requests_scheduling\request_dispatch.py in get_data(*args, *kwargs) 330 product = args[0] 331 if is_collection(product) and not isinstance(product, SpeasyIndex): --> 332 return list(map(lambda p: get_data(p, args[1:], kwargs), progress_bar(leave=True, kwargs)(product))) 333 334 if len(args) == 1:

~\Anaconda3\lib\site-packages\speasy\core\requests_scheduling\request_dispatch.py in (p) 330 product = args[0] 331 if is_collection(product) and not isinstance(product, SpeasyIndex): --> 332 return list(map(lambda p: get_data(p, *args[1:], kwargs), progress_bar(leave=True, kwargs)(product))) 333 334 if len(args) == 1:

~\Anaconda3\lib\site-packages\speasy\core\requests_scheduling\request_dispatch.py in get_data(*args, kwargs) 344 return get_data(product, get_data(t_range), *args[2:], *kwargs) 345 if len(args) == 3: --> 346 return _get_timeserie2(args, kwargs)

~\Anaconda3\lib\site-packages\speasy\core\requests_scheduling\request_dispatch.py in _get_timeserie2(index, start, stop, kwargs)
184 185 def _get_timeserie2(index, start, stop,
kwargs): --> 186 return _scalar_get_data(index, start, stop, **kwargs) 187 188

~\Anaconda3\lib\site-packages\speasy\core\requests_scheduling\request_dispatch.py in _scalar_get_data(index, *args, *kwargs) 171 provider_uid, product_uid = provider_and_product(index) 172 if provider_uid in PROVIDERS: --> 173 return PROVIDERS[provider_uid].get_data(product_uid, args, **kwargs) 174 raise ValueError(f"Can't find a provider for {index}") 175

~\Anaconda3\lib\site-packages\speasy\core__init__.py in wrapped(*args, *kwargs) 221 filter(lambda arg_name: arg_name not in self.allowed_list, kwargs.keys())) 222 if not unexpected_args: --> 223 return func(args, **kwargs) 224 raise TypeError( 225 f"Unexpected keyword argument {unexpected_args}, allowed keyword arguments are {self.allowed_list}")

~\Anaconda3\lib\site-packages\speasy\core\dataprovider.py in wrapped(wrapped_self, product, start_time, stop_time, kwargs) 28 log.warning(f"You are requesting {product} outside of its definition range {p_range}") 29 return None ---> 30 return get_data(wrapped_self, product=product, start_time=start_time, stop_time=stop_time, kwargs) 31 32 return wrapped

~\Anaconda3\lib\site-packages\speasy\core\cache_providers_caches.py in wrapped(wrapped_self, product, start_time, stop_time, **kwargs)
274 if len(data_chunks): 275 if len(data_chunks) == 1: --> 276 return data_chunks[0][dt_range.start_time:dt_range.stop_time].copy() 277 data_chunks[0] = data_chunks[0][dt_range.start_time:] 278 data_chunks[-1] = data_chunks[-1][:dt_range.stop_time]

TypeError: unhashable type: 'slice'

jeandet commented 2 months ago

@Nfargette thanks for your detailed feedback, I won't be able to look into this until next week. In the mean time, you can try using the archive module using this conf:

mms1_edp_brst_l2_hmfe:
  fname_regex: mms1_edp_brst_l2_hmfe_(?P<start>\d+)_v(?P<version>[\d\.]+)\.cdf
  inventory_path: cda/MMS/MMS1/EDP/BURST
  master_cdf: https://cdaweb.gsfc.nasa.gov/pub/software/cdawlib/0MASTERS/mms1_edp_brst_l2_hmfe_00000000_v01.cdf
  split_frequency: monthly
  split_rule: random
  url_pattern: https://cdaweb.gsfc.nasa.gov/pub/data/mms/mms1/edp/brst/l2/hmfe/{Y}/{M:02d}/mms1_edp_brst_l2_hmfe_{Y}{M:02d}\d+_v\d+.\d+.\d+.cdf
  use_file_list: true
mms1_edp_brst_l2_dce:
  fname_regex: mms1_edp_brst_l2_dce_(?P<start>\d+)_v(?P<version>[\d\.]+)\.cdf
  inventory_path: cda/MMS/MMS1/EDP/BURST
  master_cdf: https://cdaweb.gsfc.nasa.gov/pub/software/cdawlib/0MASTERS/mms1_edp_brst_l2_dce_00000000_v01.cdf
  split_frequency: monthly
  split_rule: random
  url_pattern: https://cdaweb.gsfc.nasa.gov/pub/data/mms/mms1/edp/brst/l2/dce/{Y}/{M:02d}/mms1_edp_brst_l2_dce_{Y}{M:02d}\d+_v\d+.\d+.\d+.cdf
  use_file_list: true

I used it yesterday for labeling data, you can basically copy paste this several times in the same file changing the satellite number to get all MMS spacecrafts. I don't recall the path on windows but you can get it from spz.webservices.generic_archive.user_inventory_dir(). Once saved you can restart python and you'll should find those new products in speasy inventory (spz.tree.)

An example of burst data (hmfe and dce) with Speasy/SciQLop using this conf ;): image