Closed lanphan closed 8 years ago
There is a known rebooting issue with the latest OS X (10.11) that we're looking into. It's caused by a component called mkmimo which streams the data using nonblocking IO, and it seems latest OS X's kernel has issues with high rate of poll syscalls possibly with a combination of hardware (MBP). We don't have a clean solution yet but you can either use a Linux machine instead or tune the following env variables to keep a reasonable throughput while not crashing:
export THROTTLE_SLEEP_MSEC=10 # higher the safer but slower
export DEEPDIVE_NUM_PROCESSES=1 # lower the safer but slower
I'll update here once we find a fix for this.
@netj Thanks for your quick response. However, as in #509, and I review there is really a fix value for THROTTLE_SLEEP_MSEC in case of MacOS:
# OS specific workarounds via tweaking the environment
case $(uname) in
Darwin)
# XXX mkmimo can reboot Mac unless its use of poll(2) is throttled
export THROTTLE_SLEEP_MSEC=1
;;
esac
Does that mean I need to customize value of THROTTLE_SLEEP_MSEC in deepdive?
I'll try and report here soon.
Yes, you'll have to remove that part from the installation for the moment. I'll push an update soon that gets rid of it, and hopefully a new version of mkmimo that mitigates this issue by default.
@netj Hi Jaeho, I did as you said, but it still crashed after around 9min. Is there anyway to workaround this bug? Should I increase THROTTLE_SLEEP_MSEC to 20 or 100?
Below is attached of my postgres db, comparing with the above image, data is increased a little, but still stuck in sentence table creation.
@lanphan You can increase the THROTTLE_SLEEP_MSEC
parameter to make it less likely to crash, but the throughput will become awful. Since you're eagerly looking for a solution, let's try some workarounds we currently have. These all involve replacing the mkmimo
executable installed under util/
of your DeepDive installation.
First, let's keep a backup:
(set -eu; cd $(deepdive whereis installed util/); cp -pf mkmimo mkmimo.orig)
If you clone the fix-for-mac-reboots branch and run make
, you get a mkmimo
executable for replacement.
Actually, you can just run the following command to patch your installation, assuming deepdive
is on your $PATH
:
(set -eu; git clone https://github.com/netj/mkmimo.git mkmimo-wip --branch fix-for-mac-reboots; cd mkmimo-wip; make; install -v mkmimo $(deepdive whereis installed util/mkmimo))
With this one, you can use a higher value for THROTTLE_SLEEP_USEC
(note this is *_USEC
in microseconds not milliseconds) without sacrificing much throughput in some cases, e.g., export THROTTLE_SLEEP_USEC=100
which is 0.1ms. 10 gives good throughput but crashes quite often. You can try higher values like 1000, 10000, 20000, or even 100000 to be safe at the cost of some throughput.
If it's still hard to find the right parameter that doesn't crash your Mac, or you just want something that functions, try this dumb version written in bash. It's dumb and inefficient incurring a lot of disk I/O but should get you through the data flow without crashing your Mac. You can download it and replace the util/mkmimo
file, making sure you turn on the executable bit.
The following command does what I wrote above:
(set -eu; cd $(deepdive whereis installed util/); curl -fRLO https://github.com/netj/mkmimo/raw/bash-impl-poc/mkmimo.sh; chmod -v +x mkmimo.sh; install -v mkmimo.sh mkmimo)
Finally, if you want to restore the backed up original, here's the oneliner:
(set -eu; cd $(deepdive whereis installed util/); install -v mkmimo.orig mkmimo)
Hope this helps!
@netj Before trying your proposed solution, I did report that I ran deepdive with THROTTLE_SLEEP_MSEC=50, deepdive runs well around 15 minute (I use "run well" because CPU and RAM usage is under control, CPU ~ 20% usage, RAM ~ 4G usage; and has only 1 java process), but it still crash after that. Data throughput is less than my 2nd try above.
Now I'll try with new mkmimo patch and report soon here. Thanks for your support.
@netj Below is my result after trying your first point (using fix-for-mac-reboots branch of mkmimo):
Crashed after 1h37m, attached is my data in postgres.
Crashed after 7m, with data throughput is little over the case in my first comment.
Crashed after 5m, with data throughput is almost the same with 2nd try above.
--> Conclusion: seems that it still has bug. I don't know why my first try with USEC=1000 can last to 1 hour 37 minute, but my second (USEC=500 < 1000) and third try (USEC=20000 > 1000) both failed so fast.
@netj I think it runs ok with your dumb version written in bash (your 2/ approach). However, after running 3 hours, I got error from posgres (issue #523).
Below is my quick comparison between dump version and official mkmimo:
I'm going to evaluate deepdive on Linux (Ubuntu) soon this week.
@netj I can run successfully with approach 2/ after issue #523 fixed. However, it took me 8 hours to finish "deepdive do spouse_feature". Will I wait for your fix to improve speed, Jaeho?
In the meantime, I switch to Ubuntu desktop PC (32G RAM, core i7) to see it can run well and faster.
Deepdive ran very well on Ubuntu, it finished "deepdive do spouse_feature" in around 3h40m (using official deepdive / mkmimo version, no patch). @netj I wonder that I can use your first patch in order to use _USEC (10 to 100) to improve performance?
@lanphan Glad to hear that it works fine on Linux. There's no throttling done on Linux (those parameters default to zero) so the versions we tried on Mac won't have much difference. Actually it may have marginal improvement so no harm trying. The same instructions can be used.
If you're using Postgres, increasing the DEEPDIVE_NUM_PARALLEL_UNLOADS
and DEEPDIVE_NUM_PARALLEL_LOADS
from 1 to 3-4 may give you some more speedup.
@netj
Thanks for your tips, I'll try soon. Are these parameters (DEEPDIVE_NUM_PARALLEL_UNLOADS
and DEEPDIVE_NUM_PARALLEL_LOADS
) can be used with Greenplum too?
@netj
Setting DEEPDIVE_NUM_PARALLEL_UNLOADS=3
and DEEPDIVE_NUM_PARALLEL_LOADS=3
(run on Ubuntu) help to increase performance much more: only took 2h30m to finish deepdive do has_spouse
(has_spouse
is a step after spouse_feature
, and deepdive do spouse_feature
in previous comment took 3h40m already). Are these parameters useful for other db (Greenplum, PostgresXL)?
@lanphan Yes those same flags work with different database drivers.
Hi all, I'm trying to run deepdive and it goes pretty well with small dataset with example has spouse in tutorial. I think that deepdive should support to run larger dataset, so I downloaded signalmedia-1m dataset (around 1GB data), using articles.tsv.sh (customize a little) to extract all content to have full articles-1m.tsv file (around 210MB). With that file, I try to run again deepdive:
I think source data is rather small (210MB) comparing to my PC's configuration (MacOS, 16G RAM, core i7), but surprisingly that it crashed too soon (in step parsing document, I think in step to create sentences table because lots of java processes run and consume much more RAM, around 2.2G each process, total 5 or 6 java process).
Is this a bug mentioned in #478 ? Do you have plan to fix it?
Thanks in advanced.
Ps: Attached is screenshot of my postgres database, not much data there.