Open Virtakuono opened 8 years ago
That's reinventing data management. If you are talking about problems with SQLite there are ways and configurations to make it work for multiple processes (if you are on the same machine). If you cannot find them, I will google them and post them here when I have a chance.
Ookay, I'll have a look at it. And you hit the nail in the head, what I proposed is quick and dirty, and probably dumb.
This is the first hit, let's keep the issue in the back of our heads. https://www.sqlite.org/asyncvfs.html
edit: see also http://stackoverflow.com/questions/25940079/is-there-a-way-to-run-sqlite-queries-from-python-asynchronously-or-in-parrallel
This is to resolve the performance hit that one would get due to database locking. I assumed the "problems" you mentioned were more substantial.
My understanding is that @hammouc had problems that somehow broke the database as a whole and prevented him to plot anything at all. However, I am talking second hand info here, perhaps he can provide the specifics of the issue, as well as a roadmap to reproducing it.
The problem was related to how many cores I am using during the // run. For example, if I use a number of cores =8 then I could have all the runs without any error but I could repeat the same thing and I got this error message for some runs.
Traceback (most recent call last):
File "mimc_run.py", line 184, in <module>
main()
File "mimc_run.py", line 70, in main
mimclib.test.RunStandardTest(fnSampleLvl=mystate.mySampleLvl, fnAddExtraArgs=addExtraArguments, fnInit=mystate.myInit)
File "/home/hammouc/Documents/mimclib-last/mimclib/test.py", line 104, in RunStandardTest
mimcRun.doRun()
File "/home/hammouc/Documents/mimclib-last/mimclib/mimc.py", line 881, in doRun
self.fn.ItrDone()
File "/home/hammouc/Documents/mimclib-last/mimclib/test.py", line 99, in <lambda>
iteration_idx=len(mimcRun.iters)-1)
File "/home/hammouc/Documents/mimclib-last/mimclib/db.py", line 309, in writeRunData
iteration_idx, run_id])
File "/home/hammouc/Documents/mimclib-last/mimclib/db.py", line 174, in execute
self.cur.execute(q, tuple(new_params))
sqlite3.OperationalError: database is locked
For now I fixed this issue by using a smaller number of cores so I do not have this error message for the database anymore but I do not know yet why this is happening .
But after fixing this issue, I am still getting the plotting issue (even for a database that has no errors in its runs, see attached database there) see issue [https://github.com/StochasticNumerics/mimclib/issues/76)]
I acknowledge that there is a case to be made for using MySQL if one wants full support of parallelism. I'll propose a quick and dirty alternative instead, however. @hammouc had some problems running too many processes in parallel and we suspect that the problem is too many write operations into the same file simultaneously.
One way to avoid such behaviour is to pass to individual threads different output files and then compile everything into one database, much like we already pass different random seeds to each of the threads. At the end of the parallel runs, one would then need to run a script to merge two database files into one.