So far I've identified a means to accomplish this from page 8 of the Zoning Effect paper. Later i'll look into a means to accomplish this.
About
What i really want is a means is to compare value of land on an site independently of the size of the site is attached too. Which should help make visualisations within an LGA where people want to live based on the assumption the land value has priced all preferences in.
Why have i sought this out
Here is the sites in the CBD ranked by $ per sqm, look at the area of these sites.
While there are cases of data issues in the valuer general in the data, when you sort the sites within in sydney by land value by SQM, it seems like smaller sites tend to rank higher on a basis sqm basis.
I think this is because there's a marginal rate at which land increase in value where the next meter will be worth than the last. As you increase the size of land more types of projects become viable, but any meter you add after the earlier meters does nothing for the projects that viable regardless these extra meters.
Why does this bother me
If we had the shapefile for every lot this wouldn't actually be a problem, but because we are aggregating by meshblock, a single Telstra phone booth can inflate the aggregated value of the meshblock if it's something like max, or mean or something.
If you wanted to compare the value of land in Sydney independently of size of the lot, and you're grouping by things like meshblock. Aggregations like max or mean will be heavily skewed by this if you have a random 1x1 Telstra phone booth, which is the case in a few places in the CBD.
It would be nice to be able to weigh each meter independent of the size of the loot it's actually too, but maybe that is fanciful.
Hacky solutions
So far I've included something like this in my land value aggregations, but it's fairly arbitrary and dishonest
CASE
WHEN p.area < 10 THEN 10
ELSE p.area
END
Note, this isn't used in the data ingestion process but instead in the other notebooks where I've been trying to visualise the data.
Possible projects once accomplished
This will allow for the creation of a visualisation of different LGAs that show where the most valuable land is that isn't skewed by small sites.
Maybe get a distribution of land values in an area by doing following for each site
def marginal_value_of_land(valuation, nth_meter):
# The paper says this
# log(sale price) = c + b log(land area) + aX + e
#
# it would be neat to do something like
# (b log(nth_meter)) - (b log(nth_meter - 1))
#
# I don't even know if it makes sense to do that...
# Possibly useless. maybe this is more reasonable
# b log(1)
pass
def population_of_all_land_values(valuations):
for v in valuations:
for nth_meter in range(0, v.sqm_area):
yield marginal_value_of_land(v, nth_meter)
With that population of land values you can see the distribution of land values, I'm honestly unsure what the most sensible way to do this is...
Solutions
It's possible this methodology for comparing land by these aggregations is flawed and I should look at other methodologies.
It's possible there some kind of coefficient you can figure out from hedonic pricing models?
this problem is a problem because I'm aggregating multiple properties by mesh blocks, if I had the shape files for the actual properties this wouldn't be a problem
It's entirely possible I'm looking at this all wrong, I think first, it's best to establish a better understanding of the nature of things first before proposing a fix. Let's see what research says about it.
Status
So far I've identified a means to accomplish this from page 8 of the Zoning Effect paper. Later i'll look into a means to accomplish this.
About
What i really want is a means is to compare value of land on an site independently of the size of the site is attached too. Which should help make visualisations within an LGA where people want to live based on the assumption the land value has priced all preferences in.
Why have i sought this out
Here is the sites in the CBD ranked by
$ per sqm
, look at the area of these sites.While there are cases of data issues in the valuer general in the data, when you sort the sites within in sydney by land value by SQM, it seems like smaller sites tend to rank higher on a basis sqm basis.
I think this is because there's a marginal rate at which land increase in value where the next meter will be worth than the last. As you increase the size of land more types of projects become viable, but any meter you add after the earlier meters does nothing for the projects that viable regardless these extra meters.
Why does this bother me
If we had the shapefile for every lot this wouldn't actually be a problem, but because we are aggregating by meshblock, a single Telstra phone booth can inflate the aggregated value of the meshblock if it's something like
max
, ormean
or something.If you wanted to compare the value of land in Sydney independently of size of the lot, and you're grouping by things like meshblock. Aggregations like max or mean will be heavily skewed by this if you have a random 1x1 Telstra phone booth, which is the case in a few places in the CBD.
It would be nice to be able to weigh each meter independent of the size of the loot it's actually too, but maybe that is fanciful.
Hacky solutions
So far I've included something like this in my land value aggregations, but it's fairly arbitrary and dishonest
Note, this isn't used in the data ingestion process but instead in the other notebooks where I've been trying to visualise the data.
Possible projects once accomplished
Maybe get a distribution of land values in an area by doing following for each site
With that population of land values you can see the distribution of land values, I'm honestly unsure what the most sensible way to do this is...
Solutions
It's entirely possible I'm looking at this all wrong, I think first, it's best to establish a better understanding of the nature of things first before proposing a fix. Let's see what research says about it.
Consider reading the RBA paper, on Zoning Effect
Notes Reading Paper "The effects of Zoning on Housing prices"
log(sale price) = c + b log(land area) + aX + e