PecanProject / pecan

The Predictive Ecosystem Analyzer (PEcAn) is an integrated ecological bioinformatics toolbox.
www.pecanproject.org
Other
202 stars 235 forks source link

Workflow.R fails during wrap-up for multi-site runs #2377

Open serbinsh opened 5 years ago

serbinsh commented 5 years ago

Bug Description

Failing here

db.query(paste("UPDATE workflows SET finished_at=NOW() WHERE id=",
                 settings$workflow$id, "AND finished_at IS NULL"),
           params = settings$database$bety)

because multi-site runs don't seem to create a single workflow ID?

pecan.CONFIGS.xml.multi.txt

So they questions are:

What should multi-site runs be generating in terms of a workflow ID? Should we update this part of workflow.R to account for single-site and multi-site workflows? What happens if we write to BETYdb, does it then generate a workflow ID?

serbinsh commented 5 years ago

Update: if TRUE in

 <database>
  <bety>
   <user>bety</user>
   <password>bety</password>
   <host>localhost</host>
   <port>5432</port>
   <dbname>bety</dbname>
   <driver>PostgreSQL</driver>
   <write>TRUE</write>
  </bety>
  <dbfiles>/data/pecan_dbfiles</dbfiles>
 </database>

then

 </run>
 <host>
  <name>localhost</name>
  <scratchdir>/scratch</scratchdir>
  <prerun>module load gcc/5.4.0 jags/4.3.0 udunits/2.2.25 python/2.7.14 redland hdf5/1.8.19-gcc540 netcdf/4.4.1.1-gnu540 libtiff/4.0.8 geos/3.6.3 proj/5.1.0 gdal/2.3.1</prerun>
  <qsub>qsub -l walltime=36:00:00 -V -N @NAME@ -o @STDOUT@ -e @STDERR@ -S /bin/bash</qsub>
  <qsub.jobid>([[:digit:]]+)\.modex\.bnl\.gov</qsub.jobid>
  <qstat>qstat @JOBID@ || echo DONE</qstat>
  <rundir>/data/sserbin/Modeling/sipnet/multi_site/testrun.18/run</rundir>
  <outdir>/data/sserbin/Modeling/sipnet/multi_site/testrun.18/out</outdir>
 </host>
 <email>
  <to>sserbin@bnl.gov</to>
 </email>
 <settings.info>
  <deprecated.settings.fixed>TRUE</deprecated.settings.fixed>
  <settings.updated>TRUE</settings.updated>
  <checked>TRUE</checked>
 </settings.info>
 <workflow>
  <id>2000001424</id>
 </workflow>
 <rundir>/data/sserbin/Modeling/sipnet/multi_site/testrun.18/run</rundir>
 <modeloutdir>/data/sserbin/Modeling/sipnet/multi_site/testrun.18/out</modeloutdir>
 <multisettings>
  <multisettings>run</multisettings>
 </multisettings>
</pecan>

you get a single workflow ID which avoids the error. so it seems we need to updated workflow.R to ignore if write FALSE for multisite runs? Thoughts?

serbinsh commented 5 years ago

Confirmed, workflow.R finishes correctly when TRUE

2019-06-17 16:59:01 INFO [db.print.connections] : Created 37 connections and executed 356 queries 2019-06-17 16:59:01 INFO [db.print.connections] : Created 37 connections and executed 356 queries 2019-06-17 16:59:01 DEBUG [db.print.connections] : No open database connections. [1] "---------- PEcAn Workflow Complete ----------"

serbinsh commented 5 years ago

Additional update regarding multi-site. it actually looks like it only works correctly when you write to bety. For example, here is the output when we do not write to bety and thus do not generate a workflow ID

Screen Shot 2019-06-17 at 5 38 39 PM

Here is when we do write to bety

Screen Shot 2019-06-17 at 5 39 56 PM

Note that we then see the results by site ID

robkooper commented 5 years ago

To be fair I have not really tested update=false, i think more things might be broken if set (and some items might even ignore this flag all together).

serbinsh commented 5 years ago

@robkooper not sure I follow? Was this meant for a different issue?

serbinsh commented 5 years ago

I see a few issues here, 1) presently multi-site assumes you are writing the runs to BETY such that it can name each output file uniquely using the siteID, 2) workflow.R assumes there will be a settings$workflow$id in the settings XML object, except multi-site currently doesn't provide a workflow ID if running with write=FALSE

So, do we want to allow multi-site to work without writing to BETY? If so we would need to address the fact that outputs need to have the site ID embedded to not overwrite each other. But where will that come from? Why do we currently write the site ID into the part of the file name where ENSEMBLE ID usually goes?

github-actions[bot] commented 4 years ago

This issue is stale because it has been open 365 days with no activity.