beast-dev / beast-mcmc

Bayesian Evolutionary Analysis Sampling Trees
http://beast.community
GNU Lesser General Public License v2.1
188 stars 71 forks source link

Beast getting stuck #787

Closed necrolyte2 closed 8 years ago

necrolyte2 commented 8 years ago

At this time I'm not 100% sure how to reproduce this We see this happen quite a bit with our beast runs and I'm not entirely sure what is causing it.

[pid 11231] futex(0x7efe700cb754, FUTEX_WAIT_BITSET_PRIVATE, 1, {6032690, 20480214}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
[pid 11231] futex(0x7efe700cb728, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 11231] futex(0x7efe700cb754, FUTEX_WAIT_BITSET_PRIVATE, 1, {6032690, 70617412}, ffffffff <unfinished ...>
[pid 11224] <... futex resumed> )       = -1 ETIMEDOUT (Connection timed out)
[pid 11224] futex(0x7efe70093728, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 11224] futex(0x7efe70093754, FUTEX_WAIT_BITSET_PRIVATE, 1, {6032691, 52997960}, ffffffff <unfinished ...>
[pid 11231] <... futex resumed> )       = -1 ETIMEDOUT (Connection timed out)
[pid 11231] futex(0x7efe700cb728, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 11231] futex(0x7efe700cb754, FUTEX_WAIT_BITSET_PRIVATE, 1, {6032690, 120848988}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
[pid 11231] futex(0x7efe700cb728, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 11231] futex(0x7efe700cb754, FUTEX_WAIT_BITSET_PRIVATE, 1, {6032690, 171020011}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
[pid 11231] futex(0x7efe700cb728, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 11231] futex(0x7efe700cb754, FUTEX_WAIT_BITSET_PRIVATE, 1, {6032690, 221184294}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
[pid 11231] futex(0x7efe700cb728, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 11231] futex(0x7efe700cb754, FUTEX_WAIT_BITSET_PRIVATE, 1, {6032690, 271316414}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
[pid 11231] futex(0x7efe700cb728, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 11231] futex(0x7efe700cb754, FUTEX_WAIT_BITSET_PRIVATE, 1, {6032690, 312449594}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
[pid 11231] futex(0x7efe700cb728, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 11231] futex(0x7efe700cb754, FUTEX_WAIT_BITSET_PRIVATE, 1, {6032690, 321616005}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
[pid 11231] futex(0x7efe700cb728, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 11231] futex(0x7efe700cb754, FUTEX_WAIT_BITSET_PRIVATE, 1, {6032690, 371784972}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
[pid 11231] futex(0x7efe700cb728, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 11231] futex(0x7efe700cb754, FUTEX_WAIT_BITSET_PRIVATE, 1, {6032690, 421953245}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
[pid 11231] futex(0x7efe700cb728, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 11231] futex(0x7efe700cb754, FUTEX_WAIT_BITSET_PRIVATE, 1, {6032690, 472183086}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
...

Beast essentially comes to a crawl at this point and will never finish.

I'm not sure if this is caused by #786 since we have multiple runs going and they may all be fighting for CPU time?

Also maybe related to #44 ??

maxbiostat commented 8 years ago

It'd be very useful if you could provide some more detail. Which version of beast are you using (beast -version will give you what we need)? What version of JVM? Can you supply an example XML with which this happens?

<#DDB4FAA8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

On 13 January 2016 at 14:16, Tyghe Vallard notifications@github.com wrote:

At this time I'm not 100% sure how to reproduce this We see this happen quite a bit with our beast runs and I'm not entirely sure what is causing it.

[pid 11231] futex(0x7efe700cb754, FUTEX_WAIT_BITSET_PRIVATE, 1, {6032690, 20480214}, ffffffff) = -1 ETIMEDOUT (Connection timed out) [pid 11231] futex(0x7efe700cb728, FUTEX_WAKE_PRIVATE, 1) = 0 [pid 11231] futex(0x7efe700cb754, FUTEX_WAIT_BITSET_PRIVATE, 1, {6032690, 70617412}, ffffffff <unfinished ...> [pid 11224] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) [pid 11224] futex(0x7efe70093728, FUTEX_WAKE_PRIVATE, 1) = 0 [pid 11224] futex(0x7efe70093754, FUTEX_WAIT_BITSET_PRIVATE, 1, {6032691, 52997960}, ffffffff <unfinished ...> [pid 11231] <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) [pid 11231] futex(0x7efe700cb728, FUTEX_WAKE_PRIVATE, 1) = 0 [pid 11231] futex(0x7efe700cb754, FUTEX_WAIT_BITSET_PRIVATE, 1, {6032690, 120848988}, ffffffff) = -1 ETIMEDOUT (Connection timed out) [pid 11231] futex(0x7efe700cb728, FUTEX_WAKE_PRIVATE, 1) = 0 [pid 11231] futex(0x7efe700cb754, FUTEX_WAIT_BITSET_PRIVATE, 1, {6032690, 171020011}, ffffffff) = -1 ETIMEDOUT (Connection timed out) [pid 11231] futex(0x7efe700cb728, FUTEX_WAKE_PRIVATE, 1) = 0 [pid 11231] futex(0x7efe700cb754, FUTEX_WAIT_BITSET_PRIVATE, 1, {6032690, 221184294}, ffffffff) = -1 ETIMEDOUT (Connection timed out) [pid 11231] futex(0x7efe700cb728, FUTEX_WAKE_PRIVATE, 1) = 0 [pid 11231] futex(0x7efe700cb754, FUTEX_WAIT_BITSET_PRIVATE, 1, {6032690, 271316414}, ffffffff) = -1 ETIMEDOUT (Connection timed out) [pid 11231] futex(0x7efe700cb728, FUTEX_WAKE_PRIVATE, 1) = 0 [pid 11231] futex(0x7efe700cb754, FUTEX_WAIT_BITSET_PRIVATE, 1, {6032690, 312449594}, ffffffff) = -1 ETIMEDOUT (Connection timed out) [pid 11231] futex(0x7efe700cb728, FUTEX_WAKE_PRIVATE, 1) = 0 [pid 11231] futex(0x7efe700cb754, FUTEX_WAIT_BITSET_PRIVATE, 1, {6032690, 321616005}, ffffffff) = -1 ETIMEDOUT (Connection timed out) [pid 11231] futex(0x7efe700cb728, FUTEX_WAKE_PRIVATE, 1) = 0 [pid 11231] futex(0x7efe700cb754, FUTEX_WAIT_BITSET_PRIVATE, 1, {6032690, 371784972}, ffffffff) = -1 ETIMEDOUT (Connection timed out) [pid 11231] futex(0x7efe700cb728, FUTEX_WAKE_PRIVATE, 1) = 0 [pid 11231] futex(0x7efe700cb754, FUTEX_WAIT_BITSET_PRIVATE, 1, {6032690, 421953245}, ffffffff) = -1 ETIMEDOUT (Connection timed out) [pid 11231] futex(0x7efe700cb728, FUTEX_WAKE_PRIVATE, 1) = 0 [pid 11231] futex(0x7efe700cb754, FUTEX_WAIT_BITSET_PRIVATE, 1, {6032690, 472183086}, ffffffff) = -1 ETIMEDOUT (Connection timed out) ...

Beast essentially comes to a crawl at this point and will never finish.

I'm not sure if this is caused by #786 https://github.com/beast-dev/beast-mcmc/issues/786 since we have multiple runs going and they may all be fighting for CPU time?

Also maybe related to #44 https://github.com/beast-dev/beast-mcmc/issues/44 ??

— Reply to this email directly or view it on GitHub https://github.com/beast-dev/beast-mcmc/issues/787.

Luiz Max Fagundes de Carvalho PhD student, Institute of Evolutionary Biology, School of Biological Sciences, Ashworth Laboratories, Ash 2, office 123 University of Edinburgh, United Kingdom. http://br.linkedin.com/pub/luiz-max-carvalho/49/687/283

necrolyte2 commented 8 years ago

Beast Version

$ beast -version

                  BEAST v1.8.0, 2002-2013
       Bayesian Evolutionary Analysis Sampling Trees
                 Designed and developed by
   Alexei J. Drummond, Andrew Rambaut and Marc A. Suchard

               Department of Computer Science
                   University of Auckland
                  alexei@cs.auckland.ac.nz

             Institute of Evolutionary Biology
                  University of Edinburgh
                     a.rambaut@ed.ac.uk

              David Geffen School of Medicine
           University of California, Los Angeles
                     msuchard@ucla.edu

                Downloads, Help & Resources:
                        http://beast.bio.ed.ac.uk

Source code distributed under the GNU Lesser General Public License:
                http://code.google.com/p/beast-mcmc

                     BEAST developers:
        Alex Alekseyenko, Guy Baele, Trevor Bedford, Filip Bielejec, Erik Bloomquist, Matthew Hall,
        Joseph Heled, Sebastian Hoehna, Denise Kuehnert, Philippe Lemey, Wai Lok Sibon Li,
        Gerton Lunter, Sidney Markowitz, Vladimir Minin, Michael Defoin Platel,
                Oliver Pybus, Chieh-Hsi Wu, Walter Xie

                         Thanks to:
        Roald Forsberg, Beth Shapiro and Korbinian Strimmer

Java Version

$ java -version
java version "1.7.0_67"
Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)

Beast XML

The issue doesn't always seem to happen, but this is one of the XML files where it did happen. We typically run beast 3x for each xml and sometimes 1 of them will finish while the other 2 will get "stuck"

6.HA_BEASTF_phylogeo_strict.xml.txt

maxbiostat commented 8 years ago

Thanks. Would it be possible for you to update to a more recent version of BEAST? There's a pre-release you can try. Also, if possible, it'd be good to update the JVM.

necrolyte2 commented 8 years ago

No problem. I'm updating now and we will rerun our xml's and see what happens

necrolyte2 commented 8 years ago

Blocked by #788

rambaut commented 8 years ago

788 refers to a pre-release version (1.8.3pre) and looks to be a recently introduced bug (mistake) in that version. Please test with official release v1.8.2.

necrolyte2 commented 8 years ago

This was caused by some other issue not related to beast