Closed myoshimu closed 5 years ago
I think this is because DEP_TIME is not in source data prepared by following SQL on "experiment.py".
# logistic regression
trainquery = """
SELECT
DEP_DELAY, TAXI_OUT, ARR_DELAY, DISTANCE
FROM flights f
JOIN traindays t
ON f.FL_DATE == t.FL_DATE
WHERE
t.is_train_day == 'True' AND
t.holdout == False AND
f.CANCELLED == '0.00' AND
f.DIVERTED == '0.00'
"""
Error occurred on the following function because it tries to access fields['DEP_TIME'] and fields['DEP_AIRPORT_TZOFFSET'].
def to_example(fields):
features = [ \
fields['DEP_DELAY'], \
fields['DISTANCE'], \
fields['TAXI_OUT'], \
]
features.extend(get_local_hour(fields['DEP_TIME'],
fields['DEP_AIRPORT_TZOFFSET']))
#features.extend(fields['origin_onehot'])
return LabeledPoint(\
float(fields['ARR_DELAY'] < 15), #ontime \
features)
It works after I have updated SQL as follows.
Pattern 1: Use '*'
# logistic regression
trainquery = """
SELECT
*
FROM flights f
JOIN traindays t
ON f.FL_DATE == t.FL_DATE
WHERE
t.is_train_day == 'True' AND
t.holdout == False AND
f.CANCELLED == '0.00' AND
f.DIVERTED == '0.00'
"""
Pattern 2: Specify column names used in the script
# logistic regression
trainquery = """
SELECT
DEP_DELAY, TAXI_OUT, ARR_DELAY, DISTANCE, DEP_TIME, DEP_AIRPORT_TZOFFSET
FROM flights f
JOIN traindays t
ON f.FL_DATE == t.FL_DATE
WHERE
t.is_train_day == 'True' AND
t.holdout == False AND
f.CANCELLED == '0.00' AND
f.DIVERTED == '0.00'
"""
In Chapter 07, I could not complete the last step(experiment.py). Following is the output of the submission. How can I avoid this error?