elcolumbio / mlrepricer

Explore pricing data. Share insights and models. Build environment for repricer.
Other
9 stars 3 forks source link

Store dumps in DB #1

Closed Bobspadger closed 6 years ago

Bobspadger commented 6 years ago

Would it not be better to use some form of Database to store the resulting data in so it can be queried easily?

This would give a persistent datastore that can be used and manipulated by other code if required.

eg: Pricemonitor: New price change comes in, log in db. PriceReport: Find x from y where and plot

You would also be able to then distribute the work amongst not only multiple processes / modules, but also machines.

elcolumbio commented 6 years ago

Yes for production you need some dump like that. I have something i used a bunch in the past, will include it.

Queries is i think a matter of taste. I came from SQL and rewrote most into panda queries. We can provide examples for both.

Pricemonitor is like a snapshot for prices which only represent the most recent data row?

elcolumbio commented 6 years ago

Ok we are now using a sqlite database as default. Also you can change it to your client server database in under 1 min of setup.

Now we have to improve speed and be generally clever. Like i would use lots of databases with duplicated content to make views for different tasks.

elcolumbio commented 6 years ago

I close it now. There is also a jupyter notebook explaining the usage.

For the future and especially for client server database we need to not make so many transactions. There we have to store the receipt_handle to later delete them one by one. That we can be sure we saved the data before we delete it.