beast-dev / beast-mcmc

Bayesian Evolutionary Analysis Sampling Trees
http://beast.community
GNU Lesser General Public License v2.1
192 stars 73 forks source link

unable to place discrete trait on separate partition #1157

Closed bpetros95 closed 1 year ago

bpetros95 commented 1 year ago

Hello,

I am running BEAST with CPU + 2 v100 GPUs. My model has 2 nt partitions (CP1+2, CP3) and 1 geographic partition.

Under any beagle_order (1,1,2; 0,0,1; 2,2,1, etc) BEAGLE places the geographic partition on the same machine as the nt partitions. For example, with 1, 1, 2 as the specified order, BEAGLE places all partitions on the 1st GPU.

Attaching the .log file for beagle_order 0,0,1. I have already confirmed beast-log-file.txt with beagle_info that the CPU and both GPUs are recognized by the software.

msuchard commented 1 year ago

Please provide a completely reproducible example including your exact command-line call to BEAST, your XML and the output from beast -beagle_info.

bpetros95 commented 1 year ago

Thank you!

XML: rsvb_geo_samp_aligned_100M.txt

output of beast -beagle_info:

Screenshot 2023-08-26 at 4 12 24 PM

command-line call to BEAST: beast -beagle_GPU -beagle_double -beagle_order 1,1,2 -overwrite ${XML}

log of analysis (terminated early): terminated_beast_log.txt

msuchard commented 1 year ago

Use beast -beagle_order 0,1 -overwrite ${XML} to put first (multi-partition) data-likelihood onto CPU and second (host) data-likelihood onto GPU. For user-issues and help, please post to the https://groups.google.com/g/beast-users list-serv, as this helps better engage the whole community.

bpetros95 commented 1 year ago

terminated_beast_log_1,2.log

beast -beagle_order 0,1 -overwrite ${XML} works (nt data on CPU, host data on GPU 1), but beast -beagle_order 1,2 -overwrite ${XML} does not (puts nt data and host data on GPU 1, ignores GPU 2). Same XML file as above.

Output of -beagle_info checked again, same as above.

dpark01 commented 1 year ago

Hi @msuchard thanks for the help so far. I have some more data points here that hopefully shed a bit more light on the issue. These are made with the same input XML and various ways of attempting (and failing) to place the geographic partition on its own dedicated GPU (also trying two different versions of beast/beagle as well).

The partitions in that XML are as such:

Read alignment: alignment
  Sequences = 1347
      Sites = 963
   Datatype = nucleotide
Site patterns 'CP1+2.patterns' created by merging 2 pattern lists
  pattern count = 539
Site patterns 'CP3.patterns' created from positions 3-963 of alignment 'alignment'
  only using every 3 site
  unique pattern count = 314
Read attribute patterns, 'region.pattern' for attribute, region

Creating the tree model, 'treeModel'
  taxon count = 1347
  tree height = 16.656191930600656

Just to confirm, I am interpreting the Using BEAGLE TreeLikelihood section as the thing that represents the hardware assignment for where geographic inference is running--let me know if I'm wrong about that, I personally don't have experience with phylogeo. Assuming that's right, here's the matrix of things I've tried:

# V100 GPUs requested behavior beast & beagle version actual hardware assignments
1 -beagle_order 0,1 1.10.5pre_thorney_0.1.2 on 4.0.0 0,1,0 (Data,Data,Tree)
1 -beagle_order 0,1 -beagle_multipartition on 1.10.5pre_thorney_0.1.2 on 4.0.0 0,1,0
1 -beagle_order 0,1,1 1.10.5pre_thorney_0.1.2 on 4.0.0 0,1,0
4 -beagle_order 1,2,3 1.10.5pre_thorney_0.1.2 on 4.0.0 1,1 (Multipart Data, Tree)
1 -beagle_order 0,1 1.10.4 on 3.1.2 0,1,0
1 -beagle_order 0,0,1 1.10.4 on 3.1.2 0,0,0
1 -beagle_order 0,1,1 1.10.4 on 3.1.2 0,1,0

I think the desired outcome here is for the actual hardware assignments to be either Multipartition Data = 0, Tree = 1; CP1+2 = 0, CP3 = 0, Tree = 1; or MPData = 1, Tree = 2; etc. But I can't get it to honor the beagle_order request, and whether or not it decides to multipartition the nucleotide data seems both unpredictable and not correlated to whether I specify beagle_multipartition.

A tarball of all the stdout/stderr files from these runs can be temporarily found at gs://viral-public-temp-30d/beast/beast_logs_treelikelihood_beagleorder.tar.gz. Abbreviated example of the stdout corresponding to the first row in the above table (-beagle_order 0,1, latest BEAST/BEAGLE) is here, please let me know if I'm not interpreting its hardware assignments correctly:

Using BEAGLE DataLikelihood Delegate
  Using BEAGLE resource 0: CPU (x86_64)
    with instance flags:  PRECISION_DOUBLE COMPUTATION_SYNCH EIGEN_REAL SCALING_MANUAL SCALERS_RAW VECTOR_SSE THREADING_CPP PROCESSOR_CPU FRAMEWORK_CPU PREORDER_TRANSPOSE_MANUAL
  Ignoring preOrder partials in tree likelihood.
  Ignoring ambiguities in tree likelihood.
  With 539 unique site patterns.
  Using rescaling scheme : dynamic (rescaling every 100 evaluations, delay rescaling until first overflow)

Using TreeDataLikelihood
  Branch rate model used: strictClockBranchRates

Using BEAGLE DataLikelihood Delegate
  Using BEAGLE resource 1: Tesla V100-SXM2-16GB
    Global memory (MB): 16161
    Clock speed (Ghz): 1.53
    Number of cores: 10240
    with instance flags:  PRECISION_DOUBLE COMPUTATION_SYNCH EIGEN_REAL SCALING_MANUAL SCALERS_RAW VECTOR_NONE THREADING_CPP THREADING_NONE PROCESSOR_GPU FRAMEWORK_CUDA PREORDER_TRANSPOSE_MANUAL
  Ignoring preOrder partials in tree likelihood.
  Ignoring ambiguities in tree likelihood.
  With 314 unique site patterns.
  Using rescaling scheme : dynamic (rescaling every 100 evaluations, delay rescaling until first overflow)

Using TreeDataLikelihood
  Branch rate model used: strictClockBranchRates

Creating state frequencies model 'region.frequencies': Initial frequencies = {0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1}
  General Substitution Model (stateCount=10)
  Using BSSVS Complex Substitution Model

Creating site rate model.

Creating state frequencies model 'region.root.frequencies': Initial frequencies = {0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1}

Using BEAGLE TreeLikelihood
  Branch rate model used: strictClockBranchRates
  Using BEAGLE resource 0: CPU (x86_64)
    with instance flags:  PRECISION_DOUBLE COMPUTATION_SYNCH EIGEN_COMPLEX SCALING_MANUAL SCALERS_RAW VECTOR_NONE THREADING_CPP PROCESSOR_CPU FRAMEWORK_CPU PREORDER_TRANSPOSE_MANUAL
  Ignoring ambiguities in tree likelihood.
  With 1 unique site patterns.
  Using rescaling scheme : delayed (delay rescaling until first overflow)
Optimization Schedule: log
Creating CTMC Scale Reference Prior model.
Acting on subtree of size 1347
Constructing a cache around likelihood 'null', signal = region.rates