algotradingsoc / data_infrastructure

Research team for data infrastructure team.
Apache License 2.0
0 stars 0 forks source link

minutes 6/11 #11

Closed kevinxuht closed 3 years ago

kevinxuht commented 3 years ago

Trading Dates: strip the csv names, store the dates into metadata

Delisting and listing: might due to missing data, company data missing for a day, might need to fill it up somewhere Not due to missing data, but some really illiquid (not concerned)

MongoDB: set up locally, does not do much. Considering option to switch to Artic for timeseries data. Database loaded.

Best way for testing now is to set up MongoDB locally on your computer. Now focus on cleaning the data, creating the schemas.

Anson will create a tutorial for us to set up MongoDB locally, and test our pipeline.

DataBase design: use finnhub id as priority, merge when run into duplicates.

DataLoader by TW: portal between database and research -need to include all the features, result should be a dataframe -extract data from csv / MongoDB

csv_reader: load any ad-hoc data found.

How to impute missing data in the database: If data is missing, fill it with last available data. Then add another column 'tradeable', =false when data is not available Allows for moving-avergge computation

Next-steps: More methods&features: dividends and split adjustments Organise the data by finnhub id - for research purpose store by id, need a map between symbol and ID decide & well-document the type of data for each output

Anson: Work on template with id_loader in Data_loader Sean: work on adjusted close price based on dividends& splits Rest of team: Continue working on previous week's method and incorporate them into data_loader (when it's ready) Missing data - solve later when timeseries is available database access - search by ticker - generate data with adjusted, volatility, log-returns, skewness etc...