Vigneshwarr3 / Tornado_project

0 stars 0 forks source link

Measurement Bias in Tornado Density Calculation #25

Open ChillakuruSaipranam opened 5 days ago

ChillakuruSaipranam commented 5 days ago

Issue Overview The calculation of tornado density is based on the assumption that tornadoes are uniformly distributed across the entire area of each state. This assumption introduces measurement bias, particularly in larger states with significant rural or uninhabited areas. For example:

Underreporting in Larger States: States like Alaska and Texas may show fewer reported tornadoes despite their vast size because tornadoes in sparsely populated regions are often undetected.

Detection Limitations: Tornado detection is typically more accurate in populated regions or areas with advanced infrastructure (e.g., weather stations). Consequently, areas with limited infrastructure may underreport tornado occurrences. This bias leads to inaccuracies in the analysis and underrepresents the true density of tornadoes in less populated regions. Reason for the Issue The measurement bias arises primarily due to two factors:

State-Level Aggregation Tornado counts and areas are aggregated at the state level, assuming uniform distribution across the entire state. This assumption overlooks the reality that tornadoes are clustered in specific regions, such as Tornado Alley, and rarely occur in others.

Uniformity Assumption The calculation of tornado density (Tornado_Count / State_Area) does not account for population density, detection capabilities, or geographical variations. In large states, tornadoes in remote, unpopulated areas may go undetected, resulting in underreporting and misleading conclusions about tornado density.

Proposed Solution To address the measurement bias and improve the accuracy of the tornado density calculation, the following solutions are recommended:

Incorporate Population Density Adjust tornado counts by factoring in population density or detection infrastructure. This ensures that tornado occurrences are weighted based on the likelihood of detection in different regions:

python code df['Weighted_Count'] = df['Tornado_Count'] / (df['Population_Density'] + 1)

Rationale: Regions with higher population density or better detection infrastructure (e.g., weather stations) are more likely to report tornadoes. Weighting tornado counts in this way reduces the bias caused by underreporting in sparsely populated areas.

Tomcat13 commented 2 days ago

I like this idea, but I think the solution falls victim to the same issues it was trying to solve. For example, most of the country's growth occurs in cities, but most tornado's don't hit cities. Imagine a case like New York. You'd think this state has a high population density, but in reality this is strictly driven by NYC. This would make it look like the state has a lot of potential observers, but it doesn't. This could be better if we measured tornados and population growth in counties for example, but that information isn't available. Like I mentioned, really good issue that I'd love to see a solution for, but I don't think we can do this proposed idea without information like rural population or radar info like NEXRAD.