StochasticNumerics / mimclib

A software library for UQ methods
GNU General Public License v2.0
6 stars 8 forks source link

Asynchronous inputs using sqlite #86

Open Virtakuono opened 8 years ago

Virtakuono commented 8 years ago

I acknowledge that there is a case to be made for using MySQL if one wants full support of parallelism. I'll propose a quick and dirty alternative instead, however. @hammouc had some problems running too many processes in parallel and we suspect that the problem is too many write operations into the same file simultaneously.

One way to avoid such behaviour is to pass to individual threads different output files and then compile everything into one database, much like we already pass different random seeds to each of the threads. At the end of the parallel runs, one would then need to run a script to merge two database files into one.

haji-ali commented 8 years ago

That's reinventing data management. If you are talking about problems with SQLite there are ways and configurations to make it work for multiple processes (if you are on the same machine). If you cannot find them, I will google them and post them here when I have a chance.

Virtakuono commented 8 years ago

Ookay, I'll have a look at it. And you hit the nail in the head, what I proposed is quick and dirty, and probably dumb.

Virtakuono commented 8 years ago

This is the first hit, let's keep the issue in the back of our heads. https://www.sqlite.org/asyncvfs.html

edit: see also http://stackoverflow.com/questions/25940079/is-there-a-way-to-run-sqlite-queries-from-python-asynchronously-or-in-parrallel

haji-ali commented 8 years ago

This is to resolve the performance hit that one would get due to database locking. I assumed the "problems" you mentioned were more substantial.

Virtakuono commented 8 years ago

My understanding is that @hammouc had problems that somehow broke the database as a whole and prevented him to plot anything at all. However, I am talking second hand info here, perhaps he can provide the specifics of the issue, as well as a roadmap to reproducing it.

hammouc commented 8 years ago

The problem was related to how many cores I am using during the // run. For example, if I use a number of cores =8 then I could have all the runs without any error but I could repeat the same thing and I got this error message for some runs.

Traceback (most recent call last):
  File "mimc_run.py", line 184, in <module>
    main()
  File "mimc_run.py", line 70, in main
    mimclib.test.RunStandardTest(fnSampleLvl=mystate.mySampleLvl,  fnAddExtraArgs=addExtraArguments, fnInit=mystate.myInit)
  File "/home/hammouc/Documents/mimclib-last/mimclib/test.py", line 104, in RunStandardTest
    mimcRun.doRun()
  File "/home/hammouc/Documents/mimclib-last/mimclib/mimc.py", line 881, in doRun
    self.fn.ItrDone()
  File "/home/hammouc/Documents/mimclib-last/mimclib/test.py", line 99, in <lambda>
    iteration_idx=len(mimcRun.iters)-1)
  File "/home/hammouc/Documents/mimclib-last/mimclib/db.py", line 309, in writeRunData
    iteration_idx, run_id])
  File "/home/hammouc/Documents/mimclib-last/mimclib/db.py", line 174, in execute
    self.cur.execute(q, tuple(new_params))
sqlite3.OperationalError: database is locked

For now I fixed this issue by using a smaller number of cores so I do not have this error message for the database anymore but I do not know yet why this is happening .

But after fixing this issue, I am still getting the plotting issue (even for a database that has no errors in its runs, see attached database there) see issue [https://github.com/StochasticNumerics/mimclib/issues/76)]