Open manaswisaha opened 6 years ago
I don't think that we need a statistical test for this. A statistical test to say that the means are truly "similar" is actually really uncommon and not simple. Just testing to see if there is a difference between the means and failing to reject the null (the null being that the means are equal) is not sufficient to say that the means are similar. This really feels like one of those places where we can just let the data speak for themselves and we don't need a statistical test; it opens up this whole other can of statistical worms that I don't think is worth opening (the can contains things like Bland-Altman plots and estimation statistics)
@manaswis thoughts?
I will wait until I can confirm with @jonfroehlich about this. Obviously, this would require reading the sections. Hence, I would hold off on it until then.
I have another idea for zone type analysis, which might show differences in accuracies.
I went through the definitions of the zones again. Based on the definitions of the given zone types, instead of two groups of densities, we should have low, moderate, medium and high density groups. We might get better results. I am sure the built infrastructure is different in zones with different densities. I think we just have to figure out the correct sub-groups for the zones, and I think a finer grouping based on density should work.
@misaugstad: Do you think we can divide routes based on these finer groups? Is it possible?
Do you think we can divide routes based on these finer groups? Is it possible?
I don't think so. When I tried to do the density analysis originally, that is what I wanted to do, but there is just not enough data in the high density range. I don't remember exactly how much there was, but I did try it before and saw that there wasn't enough.
If there are significant sample sizes in other groups, I think it should be fine if there isn't much in one group. We won't analyse it. At least we would have three distinct groups. I am suggesting this because I was reading about sidewalk planning and density measures that they have for land use planning, the infrastructure is built differently based on the density of the land use type.
Upon further reflection, I have recalled the bigger problem with splitting into four density categories: each zone type doesn't really fit into just one category. They are each made up of a bunch of sub-zones, so each zones end up spanning multiple densities. And those sub-zones often say things like "low to moderate density"...
Putting each zone type into one of two density categories was sort of imprecise for the above reason, but trying to put each zone type into one of four density categories sounds suuuper imprecise.
It would be a bit better if we used the sub-zones to determine density, but this would require a ton of work (many sub-zones don't specify density, so I would have to go through DC's zoning documentation to assign densities to them myself; I would have to modify the zoning script and rerun; a bunch of the sub-zones specify a range of densities anyway).
I see. That's a ton of work with unknown ROI.
Yeah, unknown and honestly it is more likely to be a very low ROI, since the interesting zones are still likely to be the ones with not enough data :)
Is there another data source with density information for DC
Sent from my iPhone
On May 26, 2018, at 7:29 PM, Mikey Saugstad notifications@github.com wrote:
Yeah, unknown and honestly it is more likely to be a very low ROI, since the interesting zones are still likely to be the ones with not enough data :)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
Is there another data source with density information for DC
I don't think there is another source that really gets us what we are looking for... We could get population density by census tract, but I don't think that's precisely what we want. For example, commercial areas may have low population density, and I think we want those labeled as high density.
To add, the density here is a measure of how the land will be used and indirectly relates to the number of people using it. For residential zones, it is dwelling units per lot and for non-residential zones, they use a safe load factor such as Floor Area Ratio (FAR) -- "relationship between the total amount of usable floor area that a building has, or has been permitted for the building, and the total area of the lot on which the building stands. A higher ratio is more likely to indicate a dense or urban construction. [1]" So it is a form of population density [2].
[1] https://www.investopedia.com/terms/f/floor-area-ratio.asp [2] https://www.census.gov/newsroom/blogs/random-samplings/2015/03/understanding-population-density.html On Sun, May 27, 2018 at 3:06 PM Mikey Saugstad notifications@github.com wrote:
Is there another data source with density information for DC
I don't think there is another source that really gets us what we are looking for... We could get population density by census tract, but I don't think that's precisely what we want. For example, commercial areas may have low population density, and I think we want those labeled as high density.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ProjectSidewalk/sidewalk-data-analysis/issues/31#issuecomment-392380592, or mute the thread https://github.com/notifications/unsubscribe-auth/ACvXgA309y2bCjbGRMqjSjAMfTM_RE6zks5t2yNFgaJpZM4UF6Wc .
So it is a form of population density
I still don't think I entirely agree with that. Again, commercial areas may have a high floor area ratio, but relatively few people who actually live there. So I don't think it is really a form of population density.
Which is why I said it is a form of population density. By definition, it is the number of people per square mile of land area. In a broader sense, it looks at the number of people living in a square mile.
The density measures used for zoning in relation to sidewalks relates to the incidence of people and automobiles in the land area.
@manaswis do you think the section works without having some test of similarity?
Hold off on this for now. Will depend on what we end up writing.
For section 4.5.2, we need to test the statistical significance of the results showing that accuracies across the density based groups are similar.
Related to #19.