EST-Team-Adam / TheReadingMachine

A Mean, Lean, Reading Machine
1 stars 2 forks source link

Check execution dates and time settings. #49

Closed mkao006 closed 6 years ago

mkao006 commented 6 years ago

There appears to be delay or miscalculation of the dates in the dashboard. On the 20th of December (10 am NZDT), the dashboard only displays data up to the 15th of December while the scraper output shows there are data scraped till the 19th of December.

mkao006 commented 6 years ago

One of the reasons for this is due to the delay of the price data. On the day of this inspection, the 1st of May (Rome Time), the data on IGC only has data up to the 27th of April. The missing data is partly due to the weekend, yet, the data for Monday the 30th of April is also not available.

To account for this, we would have to:

  1. Change the merge in the data harmonisation step.
    harmonised_data = (pd.merge(model_price, aggregated_article,
                                on='date', how='left')
                       .fillna(0))

So that we do not exclude latest sentiments and topic scores.

  1. Change the compute market force step.

We would also need to agree then set the timezone settings in the airflow.cfg as per instruction

mkao006 commented 6 years ago

The proposed solution was implemented with sentiments calculated. However, without the price, the sentiment plot could not be plotted as the polygon requires the price as the basis for the addition and subtraction.

Given the source of time delay has been identified and is a data issue with the source, this issue is thus not considered as a bug.

A separate issue should be raised if the user wishes to see the sentiment for the latest dates.