enram / vp-processing

Vertical profiles of birds (vp) data processing for analyses and visualizations
http://enram.github.io/vp-processing
MIT License
0 stars 4 forks source link

Selection/filtering settings #13

Closed peterdesmet closed 7 years ago

peterdesmet commented 7 years ago

For the data processing, I understood that not only certain radars should be included/excluded, but for some also specific time periods and maybe heights? I'm trying to find the best way to implement this. I think those selection/filtering settings:

  1. Should be defined in one place
  2. Should be defined outside the code (but read by it) to serve as intuitive documentation, which you could potentially be referenced in the methodology section of the paper
  3. Should allow for comments to indicate why certain filtering options were chosen

That could be implemented in YAML:

# YAML is basically human and machine readable documentation.

# For the filtering options, I'm not sure what is more intuitive:

# Working inclusive:
searl:
  include_datetimes:
    - ["20161004", "20161005 200000"] # No good data beyond these dates
  include_heights:
    - [200, 3000]
# Note: you would have to define this for each radar

# Or exclusive:
seang:
  exclude_datetimes: 
    - ["20161003", "20161004"] # Rain messes up too much between 3-4 October
    - ["20161005 200000", "20161005 210000"] # Bad data between 8-9pm
  exclude_heights:
    - [0, 200]
    - [3000, 4000]
# Note: you would only have to define this for radars with issues,
# but it's probably good to then also declare the general inclusive settings:
selection_settings:
  start_date: 20161001
  end_date: 20161030
  include_heights: [200, 4000]

# It's easy to exclude a radar by just not adding it to this list,
# but maybe it's good to exclude more explicitly:
sease:  
  exclude_radar: true # This one is completely rubbish

# And we could include the metadata in the same file too
sehud:
  location: Hudiksvall
  longitude: 61.5771
  latitude: 16.7144

@CeciliaNilsson709 @plieper questions:

  1. Is there a need to filter on specific heights for an individual radar or will it always be applied for all radars?
  2. Is there a need to filter (inclusive or exclusive) on multiple time periods?
  3. Is there a need to exclude specific heights for specific times? (that does become rather complex)
  4. Would you include/manage the metadata (coordinates and location) in the same file? (that could keep everything nicely together)
  5. Is the YAML solution something that could work for you or is it too complex for the filtering needs?
  6. And beyond how to implement filtering: excluding full radars from the visualization is easy to understand, but excluding specific heights and times will be less clear: you'd kinda have to remember why for some time periods or heights no data shows up in the viz.
  7. Do you consider it a good idea to use the same filtering settings for the visualization as the analysis of the data? (I think it might)
CeciliaNilsson709 commented 7 years ago

@plieper: Here are my thoughts, correct/add on please!

In general, I think the second option, having a general inclusive setting and then excluding some specific times of specific radars makes the most sense.

  1. No, I can't think of a case where we would exclude heights (other than the below 200 that applies to all).
  2. Yes, some radars will need to have multiple time periods excluded (eg rain on two separate occasions).
  3. No.
  4. I guess that makes sense?
  5. Maybe @plieper can better judge this?
  6. Yes, it will mainly be to filter out remaining rain contamination, but we will have to state that clearly of course.
  7. Yes, I agree!
peterdesmet commented 7 years ago

Awesome! Thought a bit more about 4 and I think it's better to separate the metadata from the settings, like I've done here: https://github.com/enram/timamp-etl/tree/master/settings

plieper commented 7 years ago
  1. I'm not sure that we do not need a height filter. We haven't yet decided to which altitude we want to analyse the data (all till 4km AGL, or just till the maximum extend of the radar?) I personally would like to have it in there...
  2. yes, agreed
  3. no, even if we decide to use specific heights for specific radars, it will be for the full time period (I'd think).
  4. mmm... can't tell atm, but i think both would work.
  5. i'd work with exclusion (datetimes and heights), as then you know what you throw out and you only do this for 'bad' radars. in combination with general inclusive settings, i think that's a good option.
  6. I see. in that light it'd be nice to have a list with removed radars (with reasons why) during the visualisation somehow, so that when a hole appears, you can quickly check why that is. what do you think, peter?
  7. definitely, so what we see is also what we have analyse! :-)
CeciliaNilsson709 commented 7 years ago

About 1: Just to be clear, we currently have data up to 4000 m ASL, right? I don't think there would be a reason to cut of altitudes upwards, but if we need to exclude 200 m AGL, then that's something different of course... then every radar would need its own altitude range, right? Do you think thats the case @plieper?

peterdesmet commented 7 years ago

Thanks for the answers! Here's what I take away:

  1. Next to generally filtering out 0-200m, it would be good to filter heights for specific radars.
  2. There can be multiple time periods to exclude
  3. Exclude heights for whole time range πŸ˜…
  4. As mentioned in https://github.com/enram/timamp-etl/issues/13#issuecomment-303081844 will keep settings and metadata separate
  5. General inclusive settings, specific exclusive settings
  6. Will have to see if that can be mentioned with viz. General idea is that the settings are intuitive enough (for you at least) to understand
  7. Nice

Here's a new proposal for the settings, with additional questions:

  1. All ranges are [min, max], with min inclusive and max exclusive. To me that's intuitive, especially for dates, but maybe not for everyone. :-)
  2. If you never plan to exclude heights in the middle, but only include a continuous range, you could also use include_heights: [200, 3000] as a specific radar setting, rather than the exclude_heights. Which do you consider more intuitive?
  3. If you want to list a radar, but not include it, I would just comment it out, rather than using something like exclude_radar: true
general: # General inclusive settings
  include_datetimes: ["2016-09-19", "2016-10-09"] # Is a date range up until 2016-10-08 23:59:59
  include_heights: [200, 4000] # Is a height range from 200 up until 3999

radars: # Include the radars listed. Some have specific settings
  seang:
    exclude_datetimes:
      - ["2016-09-19 00:00", "2016-09-21 12:00"] # Exclude first day and a half
      - ["2016-09-22 21:00", "2016-09-22 22:00"] # Exclude another hour
    exclude_heights:
      - [0, 200] # This one is already excluded by the general settings, so no need to add it really
      - [3000, 4000]
  sease: # Just include this radar, no specific settings
  sehud: # Just include this radar, no specific settings
#  sekir: # We know this radar exists, but we don't have data for it
peterdesmet commented 7 years ago

PS: What's ASL and AGL? πŸ˜„

jshamoun commented 7 years ago

Usually ASL = above sea level (most likely above mean sea level) AGL = above ground level

However sometimes ASL is used as Above surface level which can be confusing, so should be written out at first use.

peterdesmet commented 7 years ago

Regarding 8. Let me know if you prefer both min, max to be inclusive. I noticed that the bioRad download function also consifers the end date inclusive (so I'd have to write some extra code around it to make it exclusive) and that it is also what you meant in https://github.com/enram/timamp-etl/issues/11#issuecomment-303216019

CeciliaNilsson709 commented 7 years ago
  1. To make things more confusing, I would say that its more intuitive to have max dates as inclusive but max ranges as exclusive :) Sorry :) Choose which ever you want and then lets just make sure to be clear!

  2. (and 1) About the heights; didn't think about this before, but since we need to exclude the first 200m AGL for each radar, we need to calculate a a specific ASL height range to include for each radar (the height of each station is in the data, so should be rather straight forward). We wont need to exclude heights in the middle.

  3. Ok!

plieper commented 7 years ago
  1. for me it's more intuitive to have both min and max range inclusive :p (my programmer-friend seems to find min=inclusive max=exclusive normal though ;-)). but i agree with cecilia, choose what you want and just be clear about it :-).
peterdesmet commented 7 years ago

Haha! @stijnvanhoey and I agree with your programmer-friend! πŸ˜„ πŸ˜„ I'll see what I'll choose and be consistent. πŸ‘

peterdesmet commented 7 years ago

Settled on min/max inclusive, because that was easier to implement. 😊 All of it is documented in the example settings. Closing issue.