Open marchezinixd opened 6 years ago
It should be. Which algorithm are you running? I'll take a look.
The parameters i used were: FGES Sem-Bic Sem-Bic Penalty: 100
I have 4 cores and it is using 100% of one but nothing of the others. The dataset have 105 features, 2.6 million rows The memory is ok, it is using 14gb and i have a total of 32gb
Just one note: the part of FGES that parallelizes the best is the initial (usually most time-consuming) part. After that, there is a period where the parallelization isn't quite as good. You might for sanity's sake check to see if you're using more than one core when you first call the process.
Joe
On Thu, Sep 13, 2018 at 7:12 PM Guilherme Fernandes Marchezini < notifications@github.com> wrote:
The parameters i used were: FGES Sem-Bic Sem-Bic Penalty: 100
I have 4 cores and it is using 100% of one but nothing of the others. The dataset have 105 features, 2.6 million rows The memory is ok, it is using 14gb and i have a total of 32gb
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/bd2kccd/py-causal/issues/74#issuecomment-421181095, or mute the thread https://github.com/notifications/unsubscribe-auth/AJZZRw65AzuibZqNEJZJ04B63HdAb9YCks5uauZsgaJpZM4WoTbn .
-- Joseph D. Ramsey Special Faculty and Director of Research Computing Department of Philosophy 135 Baker Hall Carnegie Mellon University Pittsburgh, PA 15213
jsph.ramsey@gmail.com Office: (412) 268-8063 http://www.andrew.cmu.edu/user/jdramsey
Well i checked it. I reduced the penalty to 25 and ran it again. The attached image shows how it behaves. Basically there was a few seconds peak that used all cores. The second graph shows that it uses just one core at a time, but in sequentially uses all cores.
Try it on the causal-cmd cli. Attachment is its distribution. Run it with java -Xmx14G -jar causal-cmd-0.4.0-SNAPSHOT-jar-with-dependencies.jar --algorithm fges --data-type <discrete|continuous> --delimiter <comma|tab> --dataset <your_dataset> --score sem-bic --test sem-bic --penaltyDiscount 100 --json-graph
. More about causal-cmd.
causal-cmd-0.4.0-SNAPSHOT-distribution.zip
Hello @chirayukong sorry for the long time to awnser, i was having trouble with the dataset and how to handle the full size. I did the test, in the cmd it ran in ~5 minutes and had the behaviour of the attached image. When running with python it took ~1 hour and had the same behavior of the previous images. Apparently the new jar have a better performance and parallelize more than the python one. Is it possible to update the pycausal?
CMD test
The jar file is updated. Please try it. @marchezinixd
The beginning was a little different, but still following the same old pattern, while the jar ran in 4 minutes, the python is running for 20 minutes and it seems it will not end soon. Apparently it is a python problem, maybe the way it handles parallelism?
Maybe it's a problem on the javabridge library, which I don't know how to fix it. You can run it on causal-cmd and load the json result back to python.
This is the latest one. causal-cmd-0.4.0-SNAPSHOT-distribution.zip
Well i'll do it for now. I'll leave the issue open in case you guys have any ideas how to solve the python problem. Thankyou
I'm trying on a really large dataset and checking the resources usage. Apparently it is using only one core. is it possible to set it to use all cores and make it faster?