Open daguiam opened 6 years ago
Having a
/data/turbulence
directory only makes sense if we are going to store and discuss turbulent spectra and post-processing tools for it, like peak finding, spectral widths, Doppler peak analysis and so forth.
I personally hate dictionaries for entry-level users, it is easier for them to think in numpy arrays. However. dictionaries are easier to maintain. Not only we should have load and save functions, but we have to provide functions that convert numpy arrays from csvs into our dictionary structure. Maybe it's a stretch, but to simplify I would make it:
data['sI'] # In-phase signal
data['sQ'] # Quadrature signal
data['s'] # raw swept reflectometry interference signal,
# if there is an IQ signal, this should be complex
data['t'] # sweep time vector
data['f'] # sweep frequency vector
data['r'] # radial positions in machine coordinates
data['z'] # vertical positions in machine coordinates
data['rp'] # normalized flux coordinates rho_pol
data['n'] # density in 10^19m^-3
Note that everything is SI except for density.
The turbulence data might be included in the library. I think it should, eventually, since it is a reflectometry measurement. For the beginning of this project, I will personally focus on density profiles, since I am more comfortable with them.
Dictionaries are a part of Python and, even though they require too much writing, I think the benefits are worth it.
Also, I prefer to keep the variable naming as literal as possible, meaning signal_I
or signal_quadrature
instead of sI
, sQ
, etc... It makes reading the code easier imho.
We may convert the dictionaries into classes as well, which have the units embedded into the variables, for example. But that is secondary.
We need to have a structure for the different raw datasets and where they are stored in the repository.
In this project, there are several kinds of datasets, such as raw measurements, example density profiles, real density profiles, turbulence measurements, and other reflectometry measurements. What do you think should be the correct structures for each of these?
I suggest separating the raw datasets between reflectometry techniques first. For now, we have: swept reflectometry measurements and fixed frequency measurements.
Directories
All datasets are stored in
scikit-reflectometry/data/
. The subdirectories may be:Data structure
What is the size of each dataset? Is it a single sweep? How many points?
Should we store data in binary format?
.csv
?.mat
files?.json
?json
is a nice format, similar to pythondicts
, and we may add meta information such as frequency range, sweep times, etc.Functions
We should have the typical
load_data
,save_data
functions. The loading functions understand the underlying dataset structure and should return pythondicts
with the loaded data and meta information!The
data
orexample
module should handle loading the raw datasets such as