Open 1kastner opened 1 year ago
Instead of bulk inserts, maybe we can start the application with an in-memory database that we back-up to the hard drive once all data is generated. More information on how to back up in-memory databases can be found at https://stackoverflow.com/questions/67654486/how-to-backup-peewee-database-sqlitequeuedatabase-programatically
See https://github.com/1kastner/conflowgen/compare/main...lucedes27:conflowgen:Reduce-I/O-Operations-in-Container-Flow-Generation for one attempt how to achieve this. It seems like I/O operations are not the culprit but rather peewee itself. Better profiling is needed to be sure about this though.
Is your feature request related to a problem? Please describe. Currently, the generation of the container flows takes a while, especially for larger datasets. Most likely, this is due to the I/O operations - that needs to be verified first.
Describe the solution you'd like The ORM wrapper peewee offers some options to tune it. Some are presented at https://docs.peewee-orm.com/en/latest/peewee/querying.html#bulk-inserts Maybe some of these methods can be introduced with minor changes in code?
Describe alternatives you've considered Keep things as they are and learn to be patient.