NREL / rplexos

Other
18 stars 13 forks source link

[Feature] show progress for DB generation #19

Closed danielsjf closed 9 years ago

danielsjf commented 9 years ago

Eibanez,

Thank you for all the efforts you put in developing this package. Each day I use it more and more :-)

While the queries go very quickly (sometimes even faster than the Plexos solution viewer), one of the only steps that slows the workflow is the generation of the RPlexos sqlite DB. Typically, we generate solutions for big regions for a whole year of shortterm data. Combine this with lots of reporting settings and we end up with Plexos solution files of 100MB+. When you then generate a Rplexos DB, it can take some time to convert (takes approximately 30 minutes to generate the final sqlite DB of a couple GBs). Would it be possible to estimate this duration and plot a progress bar (similar to the progress bar for the queries)? It could be done by calculating the fraction of reporting settings already converted (assuming that this is relative to the total converting duration).

Alternatively, I noticed that he seems to write the sqlite DB in waves and way slower than the disk writing speed. Do you have an idea what the delaying factor is? Is it the reading of the Plexos solution file (decompressing)? Or is it maybe the switching between queries to the solution file and writing of the sqlite DB. In that case it might be beneficial to first do multiple queries on the solution file and than write bigger chunks at once.

Just some small comments on an otherwise great package :-)

Regards

eibanez commented 9 years ago

This is a tricky one. The progress bars in the queries are provided by the dplyr package automatically. I can check if they can be enabled in the data conversion piece, but they would only be updated once a database is done (an predictions would assume that they all take a similar time).

Othewise, it would be hard to pinpoint the total length of the conversion. Reading the binary files is typically the most time consuming, so something could be done there.

This has pretty low priority (it would be nice but it's not essential). I will look into levering what dplyr does, but probably won't do anything beyond that.

With respect to disk access, SQLite does as much as possible in memory and dumps to disk when it needs to. I looked into this is the past and I got it as optimized as I could.

eibanez commented 9 years ago

I thought a little more about this and having a progress bar would interfere with the outputs that are already on screen (specially when turning on the debug mode).

I'm sorry, but I'm going to leave this one as is. If you can think of a better way and can implement it, I'll be happy to accept a pull request.

eibanez commented 9 years ago

So, I reformatted some of the code to allow for parallel processing and there will now be a progress bar that updates after each database is finished.