geanders / noaastormevents_paper

0 stars 0 forks source link

See how number of reported events has changed over time for each event type #2

Open geanders opened 4 years ago

geanders commented 4 years ago

For temporal trends, we talk about long-term changes. There are some nice studies we could cite that have looked at this for tornadoes (where they can look all the way back to the 1950s), and so for tornadoes, this is a well known caveat for this dataset. However, I don't think that the same analysis has been done for other types of events, and we could do that to go along with the discussion of temporal bias that we have in our paper.

We can use our package to do that. Here's how I'd suggest starting:

  1. Pull all the events of all types for one year. If you already have other code for a certain year, you could start there.
  2. From dplyr, use group_by to group the data by event type and then use count to count up the number of events for that year.

This should give you a (much smaller) dataset with the number of events for each event type for each year.

You can then expand to do the same thing for other years. The only tricky thing is that, for each year, you'll be downloading all the data, and so that will get to be a lot of data on your computer. When I've processed lots of years of data from NOAA Storm Events in the past, I've written a for loop that does one year at a time and then deletes the full dataset from NOAA Storm Events after it's done, to make room for the next year's data. You can see an example here. I think this is probably what you'd want to do. You could use the same approach I'm using here to put the summary dataframe for each year in a list object. Then, when you're done, you should be able to convert this list of dataframes into one long dataframe with bind_rows (see here for a discussion of this). You'll probably want to use the year as the id variable when you do this (or somehow include the year as a column in the dataframe).

Then, with this dataframe, we'll be able to plot the change in number of reported events over time and then maybe even test to see if there's a statistically significant trends for different event types in the number reported over time. This could also help us in showing how different event types have been added at certain points in time.

If you can get started on the code for a single year (and feel free to try moving to the loop over years if you get some code for the first year), I can help when we meet next week.

theresekon commented 4 years ago

I found this code that lists event type and number of events for 2019:

events_2019 %>%
  group_by(EVENT_TYPE) %>%
  summarize(N = n()) %>%
  arrange(desc(N)) %>%
  mutate(N = prettyNum(N, big.mark = ",")) %>%
  knitr::kable(col.names = c("Event type", "Number of events in 2019"))

It also works with this simplified version:

events_2019 %>%
  group_by(EVENT_TYPE) %>%
  count() %>%
  knitr::kable(col.names = c("Event type", "Number of events in 2019")

I am struggling with how to adapt the code you have linked here with the for loop for the hurricane exposure data to our data.

theresekon commented 4 years ago

I just pushed up the latest version of the for loop script. I changed the years to 1950-2020 which is what the website with the files had on it. I also saved the dataframe as a csv file.

When I open the csv file, the first column lists the number of the row. Should I try to get rid of this column? Or just keep that in mind when we read the csv file into R for plots?

geanders commented 4 years ago

That's great!

I don't think we need to worry too much about the row numbers. We can just chop those off when we read the data back into R.

One thing to be aware off---we'll want to exclude the 2020 counts when we make plots, because that will only be the numbers reported up to now this year, so it'll be lower than for other years (which will include all years). That will be easy for us to remove in R, too, though---no need to change the dataset you've created.

theresekon commented 4 years ago

I'm trying to go through the dataframe and rename these event types from the 1990s but several of them don't seem to have an exact equivalent. Should I just rename to what I think is most relevant? Or for the ones below I could do either Thunderstorm Wind or the event type that comes after the slash?

THUNDERSTORM WIND/ TREES THUNDERSTORM WIND/ TREES THUNDERSTORM WINDS FUNNEL CLOUD THUNDERSTORM WINDS HEAVY RAIN THUNDERSTORM WINDS LIGHTNING THUNDERSTORM WINDS/ FLOOD THUNDERSTORM WINDS/FLASH FLOOD THUNDERSTORM WINDS/HEAVY RAIN

geanders commented 4 years ago

For the ones with "Trees", we can just do "Thunderstorm Wind".

For the others, I see what you mean, these both are real events. I recommend that we look at the NOAA manual for reporting events and see how this each combination of events would be reported today. I think that likely the solution will be to copy each of these into two events, one listed as "Thunderstorm Wind" and one as the other type of event.

Whatever we do, we should take clear notes so we can report how we handled this. Also, we should add this as an example in temporal bias, in terms of changes in typical ways of reporting over time. If you were working with this data and didn't know this, and wanted to get a subset of data just with "Thunderstorm Wind", this could cause problems. You might filter just to event_types of "Thunderstorm Wind", and that would be find for earlier in the data and also for recent years, but you'd miss loads of things in this period with these unconventional event types.

theresekon commented 4 years ago

Okay thank you!

I am struggling to find much in the manual about how these events were recorded at the time (1990s). It seems to mostly list how Thunderstorm Wind events are recorded today. It seems to me that these might have been large episodes with multiple events that they entered as just one event.

Should I try to just copy them into two events? Do you have instruction on how to do this?

theresekon commented 3 years ago

I have filtered out all of the instances of events that are in all caps and have multiple events under one entry:

all_caps <- year_events_df %>% 
  filter(EVENT_TYPE %in% c("TORNADOES, TSTM WIND, HAIL", "THUNDERSTORM WINDS/FLOODING", "HAIL FLOODING",
                          "THUNDERSTORM WINDS FUNNEL CLOU", "THUNDERSTORM WINDS HEAVY RAIN",
                          "THUNDERSTORM WINDS LIGHTNING", "THUNDERSTORM WINDS/ FLOOD",
                          "THUNDERSTORM WINDS/HEAVY RAIN", "TORNADO/WATERSPOUT", "HAIL/ICY ROADS", 
                          "THUNDERSTORM WIND/ TREE", "THUNDERSTORM WIND/ TREES ", "THUNDERSTORM WINDS/FLASH FLOOD",
                          "OTHER", "THUNDERSTORM WIND/ TREES")) 

Should I just use the str_to_title function to change these? or should I also try to separate them into multiple event types?

theresekon commented 3 years ago

@geanders

Could you take a look at temporal_bias_event_type.R and search for your name in the code comments?