Closed mardet987 closed 4 years ago
Hi Marci,
Can you give an example of where you propose the new numeric field going?
For reference, at LTU we've been using the following filename style for stereo operations at Buckland Park (which has flow on effects to BAS operations at FIR):
20200324.0000.01.bpk.a.iqdat.bz2
(channel 'A')
20200324.0000.01.bpk.b.iqdat.bz2
(channel 'B')
Both channels can be operating on different frequencies, beams and so on. Using the terms "Channel A" and "Channel B" seems pretty entrenched at this stage, so going to a forced numeric field could be jarring.
I'm not sure if I understood your post above correctly, but were you suggesting several digits together forming your new numeric field? If there's need for more than 10 channels, I don't have any issue with that, provided that they're clearly delimited with dots. Given the state of modern programming languages, any number of alphanumeric character(s) between two dots used as delimiters should not be a burden.
However, how would you propose allocating those IDs? Channel names 'a' and "b' are somewhat self-describing, but channel id "438" is not.
I propose:
1) that any reasonable number of alphanumeric characters could be used as channel-unique identifier (limit to four characters?)
2) the new channel identifier is placed in the string as shown in my example above (20200324.0000.01.bpk .a
.iqdat.bz2)
3) downstream processing from the IQDAT files should keep the same filename prefix- as is done today:
20200324.0000.01.bpk.a.iqdat.bz2
-> 20200324.0000.01.bpk.a.rawacf.bz2
-> 20200324.0000.01.bpk.a.fitacf.bz2
Hi @brianbienvenu, thanks for the extra information and background info. I agree this solution could also work.
In theory, I still prefer the numbers and think it is most straightforward to use numbers for any kind of sequential field (channel/slice 31 vs 'af' or 'F' if using caps). However, there is some precedent with alphanumeric that I know would be difficult to resolve so your solution might be the easiest route from the data producer perspective. It might also be possible to allow both alphanumeric and numeric but this could be at the expense of data users' understanding and ease of use.
Another thought - if a standard for rawacf distribution is produced perhaps it will specify that only a certain number of datasets should be distributed from any radar at a time anyways (more could be an overwhelming amount of data), which would remove the need for higher numbering anyways. Specifications might be made for which type of experiments are distributed as well (only approved experiments that users can find information on and understand, for example). I guess what I'm suggesting is that more guidance on what should be provided for distribution would be nice from both data producer and data user standpoint.
Sorry if I haven't added too much to this and I know this issue is off towards some other work. I thought I would capture one thought here, that if we moved to the number notation for channel/slice ID, this would mean modifying the filenames for many current files in distribution as well as higher level data products (fitacf, etc.) that will need filenames changed. This could be done, but might be a good bit of work on checking scripts as well as changes in current mirrors. This change would also impact IO/analysis software.
Thinking about IO/analysis software, is there a consideration here for what we would do for non-multichannel/slice radars? With the current filenaming convention my solution has been something like:
if radar is [ade, adw, kod, ksr, mcm, sps] then
read in files with either rad.a or rad.b or rad.*
else
read in files with rad
I agree that there are common/easy programming solutions if the channel is delimited with a period, but in that sense, the code would have to figure out the number of fields and then read things appropriately. Otherwise, if you hardcoded looking for the channel field as the fifth field in 20200302.2201.00.inv.rawacf.bz2
you would end up with 'rawacf'. The rabbit hole I'm going down is (and I don't like suggesting it given the amount of work just on aligning the channel fields mentioned before), do we need to move all filenames to include this field even if they are non-multichannel? For instance:
20200302.2201.00.inv.rawacf.bz2 -> 20200302.2201.00.inv.1.rawacf.bz2
I can understand that continuing with using letters is the limitation to 26 or 52 on the number of channels. I don't want to limit innovation, but I haven't seen a case yet where this limitation would be exceeded. It's certainly possible, but just want to make sure that's not solving a problem we don't have yet.
From a relatively basic user point of view, I have always been annoyed by this extra character corresponding to channel. I don't know which one to use and this is necessary to open one file to know which frequency is used and after to decide which one to keep, that is complicating the process when analysing several radar data set at the same time (for maps or statistical study...). I wonder, if at least the team in charge of a radar capable to record data for several channels (or even more complicated in the case of Borealis) should not be providing a specific file that would be use for common data analysis, such as running convection maps and that would have the common name standard used for the majority of the radars.
I tend to agree with @SDFrance about the confusion these channels / slice numbers in the filename can cause for users. Why can't the data from these complex experiments first be processed or merged into a single rawacf file which is compatible with the standard SuperDARN data distribution for map potential processing? Presumably the other digital imaging-style radars (eg the UAF USRP design) are already doing so at the PI institution before sharing the data. And if the radar operators are unable to create such a "compliant" rawacf product, then how is an end user (who is almost certainly unfamiliar with the intricacies of a given radar system) expected to do so?
@SDFrance @egthomas to remove the channel/slice identifier, would you suggest combining multiple channels into a single file? I have in general heard complaints about this idea due to how plotting methods have handled multi-channel data from a single file in the past. If we go this route, could we put limitations on what all could be combined in a single file? I'm thinking this would be of great help to those writing processing/plotting software. Alternatively, could a single-channel dataset be served independently for all radar-time? Additional channels could also be placed in the distribution but separately (when recorded). I think this would require better data curation techniques but could be the ideal solution?
I was more thinking to the generation of a specific file corresponding either to a specific channel or to a specific slice of the data (if radar operations are more complicated) which would mimick the format of a usual rawacf file of other radars. This file would have the normal structure (usual data recorded scan by scan with a specific frequency at least fixed scan by scan, the regular 1 or 2 min timestep) and without any extra character in the file name. This file could be directly fed in RST to compute fitacf, dmap like the other files (I personnally have a python script which does all this step directly from rawacf to map files and having files with strange names or multiple files for one radar for a same time period, is annoying me, because this means that I have to decipher first which channel I have to use for one radar). It does not preclude at all the other files of this radar to be kept, stored, archived as SuperDARN files with more explicit file names for people who have more experiences with those files or want to explore new type of data.
So in fact, I share a large part of your suggestions/comments @mardet987 and @egthomas. I think that the team in charge of the radar should be the one responsbile for generating this specific file for "normal" use, choosing the channel with the largest number of data or keeping the frequency the more stable possible throughout the 2 hrs duration of the file or else (this can be discussed as part as the standardization procedure).
I don't know if it would satisfy Aurleie's intent, but one possibility would be to keep the naming convention with some channel identifier and for all radars that produce only a single data stream use some default value. The UAF radars have always tried to keep the same letter for the main program (such as normal scan or whatever special mode was running) and then used a different character for any separate program like the beam staring over Poker Flat. For most of our radars the main channel is channel "a", Kodiak it's "d"...don't ask...let me just say Jef Spaleta....
If we keep the channel identifier and set a standard for a default mode, we would be able to keep the other useful data in the database while still allowing people to have scripts that select only the default channel. If we get rid of the identifier we have to maintain two separate databases if we want people to have access to anything other than the default.
On 5/7/20 2:48 AM, SDFrance wrote:
So in fact, I share a large part of your suggestions/comments @mardet987 https://github.com/mardet987. I think that the team in charge of the radar should be the one responsbile for generating this specific file for "normal" use, choosing the channel with the largest number of data or keeping the frequency the more stable possible throughout the 2 hrs duration of the file or else (this can be discussed as part as the standardization procedure).
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/SuperDARN/dswg-published-docs/issues/4#issuecomment-625064239, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEZ5G7USOADTNRGPE7AAMNLRQJKTXANCNFSM4L4OQB6A.
--
Bill Bristow Professor of Electrical Engineering
Geophysical Institute 2156 Koyukuk Dr Fairbanks, AK 99775
Phone: 907-474-7357
@wabristow: in principle, I am not against this, but this means that the info about which channel is the default one for each radar should be indicated somewhere (meaning also where?) and if channel changes happen in the future, we need to keep track of this. When we see how we already struggle to keep track of hdw.dat files changes, I am quite worried :).
as it is, we have a list of radar identifiers in our scripts or in the "radar.dat" file in the RST. My scripts for producing potential maps have "kod.d" for Kodiak. If we set a standard for the default channel name, all our the radars in our scripts would have a default name. Or, the channel could be inserted into radar.dat, which is read by the RST c-code so you wouldn't have to modify scripts if the default changed later.
On 5/7/20 8:35 AM, SDFrance wrote:
@wabristow https://github.com/wabristow: in principle, I am not against this, but this means that the info about which channel is the default one for each radar should be indicated somewhere (meaning also where?) and if channel changes happen in the future, we need to keep track of this. When we see how we already struggle to keep track of hdw.dat files changes, I am quite worried :).
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SuperDARN/dswg-published-docs/issues/4#issuecomment-625228129, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEZ5G7RLGKGIWFRYJ3LB7PTRQKTH5ANCNFSM4L4OQB6A.
--
Bill Bristow Professor of Electrical Engineering
Geophysical Institute 2156 Koyukuk Dr Fairbanks, AK 99775
Phone: 907-474-7357
I'm pretty sure the radar.dat
file supports an unlimited number of string identifiers for each radar, so we could easily append "kod.d"
to the list of codes (eg, "kod" "a" "kod.d"
).
That information may not actually be used anywhere in the standard RST processing flow of rawacf > fitacf > grid > map though (numeric stid
is generally used to identify radars instead). But at least it would be recorded somewhere for users to refer back to.
It is passed as part of the RadarNetwork structure returned by RadarLoad, so potentially it could be used in the RST codes. Doing so would push the burden of handling changes to the maintainers of the RST so users wouldn't have to change scripts.
The problem I see is that for things like map_potential the special mode data can be useful. If we write the codes so they only take the default channel, we lose data.
On 5/7/20 10:24 AM, Evan Thomas wrote:
I'm pretty sure the |radar.dat| file supports an unlimited number of string identifiers for each radar, so we could easily append |"kod.d"| to the list of codes (eg, |"kod" "a" "kod.d"|).
That information may not actually be used anywhere in the standard RST processing flow of rawacf > fitacf > grid > map though (numeric |stid| is generally used to identify radars instead). But at least it would be recorded somewhere for users to refer back to.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SuperDARN/dswg-published-docs/issues/4#issuecomment-625286783, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEZ5G7VZECSLUKVUQ5YUAWLRQLAARANCNFSM4L4OQB6A.
--
Bill Bristow Professor of Electrical Engineering
Geophysical Institute 2156 Koyukuk Dr Fairbanks, AK 99775
Phone: 907-474-7357
Let's say ok to @wabristow for handling channel, but for more complicated data (such as Borealis), that would probably not work.
Following the DSWG meeting today, and as someone who has a STEREO radar (FIR) that produces 'a' and 'b' files (as Brian stated above) I thought I ought to but in with some comments.
Personally I think that the convention is already YYYYMMDD.HHMM.SS.RID.X.rawacf.bz2 where X is a letter that represents a channel or is absent if the radar only has one channel. I think that scripts are easily changed to deal with these variations (as Kevin says above, and as we have done at BAS), but back processing the whole SuperDARN archive is not trivial, especially with the limited resources that most groups have. I think that anything that requires retrospective naming of files should be ruled out completely.
I can totally understand your point Marci that a numeric field would be better, but do you really envisage releasing more than 26 data files from a single radar? This would probably quickly overwhelm the storage on both the mirrors and the data stores of some SuperDARN users who like to store the data locally. It would also produce confusion for lots of users (at least in the first instance). A reduced number of data sets is probably better for SuperDARN data distribution.
I also don't see why files with only one channel need this part of the filename. When parsing filenames in a script it is trivial to count the characters or dots before the 'rawacf' to identify what format the filename is in. I don't see this as a problem.
As for Aurelie's comment - I think the convention should always be that the 'a' file is the standard default file that should be used for convection maps etc. This makes total sense. And additional files are used for real STEREO (and other) studies. But maybe Jef Spaleta has already spoiled this using 'd' with KOD (as I found out myself earlier this year!!). Following on to Evan's comment - we could all then simply comply by stating that the 'a' file was that to be used for convection maps, and users can ignore the other files if they wish. But it is nice for users to have access to that extra data from other channels if they wish. I think combining channels into one file is a nightmare and introduces even more complications and headaches for providers and users alike.
Anyway, thought I'd give my view.
Marci - congratulations on controlling the rabble in the meeting today!
Gareth.
Just wanted to follow up here, but this is basically closed and moved onto #8, right @mardet987? Just making sure it's here for when we come back to this after we've forgotten about it.
Correct @ksterne this has been moved to issue #8 as the scope has changed due to PI decision.
Submitter Name and Institution
Name: Marci Detwiller Institution: SuperDARN Canada, University of Saskatchewan Preferred contact method (e.g., email): marci.detwiller@usask.ca
Issue Type
Interoperability of SuperDARN data
This is mainly an interoperability issue from both sides: producing this data product and processing this data product.
Standard Infringement Only a convention (not a standard) exists for naming, but it does not serve the needs of new operations systems, so the convention is being broken by multiple institutions.
Description of Issue
There is currently no documented standard in a governing document on how to name the 'level 1' format, RAWACF, for distribution.
There is a convention: YYYYMMDD.HHMM.SS.RID.rawacf.bz2 (RID=three-character radar id)
Where the maximum file length is 2 hours and the files are broken up on the 2 hour mark.
However, this convention is being broken by multiple radars because it doesn't serve the needs of institutions where the operations systems have been updated to run multi-channel modes or multiple experiments in the same time frame. Since the new radars can produce multiple datasets in the same time frame, specifying only by time and radar is not enough in order to allow them to operate optimally.
An additional field would be necessary in order to accommodate the multiple datasets being produced at one time and to allow these radars to operate to the extent of their capabilities. However, additional fields may pose a problem with scripts for existing users.
I am an engineer with SuperDARN Canada so have been involved in the development of our new system which faces this issue. I am aware of the University of Alaska Fairbanks group that has also faced this issue and has adapted the file naming to fit their needs. There may also be other groups that have faced or will face this issue.
Suggested Solution
Our new system at SuperDARN Canada, Borealis, can produce many datasets at one time and even adjust its transmissions (add new datasets) based on incoming data. Each dataset has a 'slice ID' and is produced by a unique transmission (transmissions can be unique via frequency, pulse sequence, etc - see the docs for more info on Borealis slices here).
My suggested solution would be that DSWG prepare a document outlining the standard requirements of a distributed rawacf file, including a naming standard which takes into account capabilities of the new radars. I would suggest that the naming standard allow both:
An additional numeric field would allow all radars to 'ID' the datasets being produced at any time. I would suggest that the field be numeric to allow many different IDs to be created by the same experiment, which would reduce limitations on radar development.