EricR86 / segway-issues-proxy

0 stars 0 forks source link

Identification+posterior broken in segway 1.1 dev release #16

Closed EricR86 closed 10 years ago

EricR86 commented 10 years ago

From jay.hesselberth on May 02, 2011 10:17:35

After test suite with segway-1.1.0.dev- r5739 : training runs fine, but identification (and possibly viterbi decoding) are broken; on the command line they go through the memory progression and exit (test results attached). In error logs, getting error like:

Traceback (most recent call last): File "/common/arch/Darwin-i386/bin/segway-task", line 8, in load_entry_point('segway==1.1.0.dev- r5739 ', 'console_scripts', 'segway-task')() File "/common/arch/Darwin-i386/lib/python2.7/site-packages/setuptools-0.6c11-py2.7.egg/pkg_resources.py", line 318, in load_entry_point File "/common/arch/Darwin-i386/lib/python2.7/site-packages/setuptools-0.6c11-py2.7.egg/pkg_resources.py", line 2221, in load_entry_point File "/common/arch/Darwin-i386/lib/python2.7/site-packages/setuptools-0.6c11-py2.7.egg/pkg_resources.py", line 1954, in load File "/common/arch/Darwin-i386/lib/python2.7/site-packages/segway-1.1.0.dev_r5739-py2.7.egg/segway/task.py", line 27, in from run import (POSTERIOR_SCALE_FACTOR, read_posterior, POSTERIOR_PROG) File "/common/arch/Darwin-i386/lib/python2.7/site-packages/segway-1.1.0.dev_r5739-py2.7.egg/segway/run.py", line 46, in from .cluster import (make_native_spec, JobTemplateFactory, RestartableJob, File "/common/arch/Darwin-i386/lib/python2.7/site-packages/segway-1.1.0.dev_r5739-py2.7.egg/segway/cluster/init.py", line 68, in with Session() as _session: File "build/bdist.macosx-10.4-x86_64/egg/drmaa/init.py", line 527, in enter File "build/bdist.macosx-10.4-x86_64/egg/drmaa/init.py", line 274, in initialize File "build/bdist.macosx-10.4-x86_64/egg/drmaa/wrappers.py", line 59, in init File "build/bdist.macosx-10.4-x86_64/egg/drmaa/errors.py", line 90, in error_check drmaa.errors.DrmCommunicationException: code 2: denied: host "node006.cluster.private" is neither submit nor admin host

Notably, there are no issues with backing down to segway 1.0.2 (e.g. this is not a problem with the drmaa library or SGE job queuing).

Attachment: test.tar.gz

Original issue: http://code.google.com/p/segway-genome/issues/detail?id=16

EricR86 commented 10 years ago

From hoffman...@gmail.com on May 02, 2011 09:17:46

Looks like the problem is caused by importing run.py from task.py, which in turn tries to open a DRMAA session. Avinash, can you please fix this? You should move those items to _util.py and import them from there into run.py as necessary.

Status: Accepted
Owner: avinash....@gmail.com

EricR86 commented 10 years ago

From jay.hesselberth on May 05, 2011 08:02:44

Fixed these errors (patch attached). Testing completes successfully but the result is FAIL; I'm guessing this is because the benchmark files aren't there or aren't the correct ones?

jhessel@amc-einstein /vol1/software/modules-python/segway/build/segway-1.1.0.dev- r5739 /test $ ./test.sh traindir/observations/chr21.0000.float32 (9411193, 9595548) PROGRAM ENDED SUCCESSFULLY WITH STATUS 0 AT Thursday May 05 2011, 08:55:18 MDT Job <897> is submitted to default queue . queued 897: emt0.0.0.traindir.b12335ca772711e0addb001b219cf92c ("select[mem>2148 && tmp>11] rusage[mem=2148, tmp=11]") Job <898> is submitted to default queue . queued 898: emt0.0.bundle.traindir.b12335ca772711e0addb001b219cf92c ("select[mem>2148 && tmp>11] rusage[mem=2148, tmp=11]") Job <899> is submitted to default queue . queued 899: emt0.1.0.traindir.b12335ca772711e0addb001b219cf92c ("select[mem>2148 && tmp>11] rusage[mem=2148, tmp=11]") Job <900> is submitted to default queue . queued 900: emt0.1.bundle.traindir.b12335ca772711e0addb001b219cf92c ("select[mem>2148 && tmp>11] rusage[mem=2148, tmp=11]") /tmp/chr21.0000.ca8390d2772711e09d4f001b219cf92c.float32 (9411193, 9595548) PROGRAM ENDED SUCCESSFULLY WITH STATUS 0 AT Thursday May 05 2011, 08:56:00 MDT Job <901> is submitted to default queue . queued 901: vit0.identifydir.ca8390d2772711e09d4f001b219cf92c ("select[mem>2148 && tmp>14] rusage[mem=2148, tmp=14]") Job <902> is submitted to default queue . queued 902: jt0.identifydir.ca8390d2772711e09d4f001b219cf92c ("select[mem>2148 && tmp>14] rusage[mem=2148, tmp=14]") ../data/traindir/log/jobs.tab and traindir/log/jobs.tab differ FAIL: ../data/traindir and traindir: 30 files match; 1 files mismatch ../data/identifydir/log/details.sh and identifydir/log/details.sh differ ../data/identifydir/log/run.sh and identifydir/log/run.sh differ template directory missing posterior/posterior3.0.bed template directory missing posterior/posterior2.0.bed template directory missing log/jt_info.posterior.txt template directory missing output/e/identify/jt0.identifydir.ca8390d2772711e09d4f001b219cf92c template directory missing posterior/posterior0.0.bed template directory missing posterior/posterior1.0.bed template directory missing output/o/identify/jt0.identifydir.ca8390d2772711e09d4f001b219cf92c FAIL: ../data/identifydir and identifydir: 14 files match; 9 files mismatch

Attachment: segway-1.1.0.dev-r5739.diff

EricR86 commented 10 years ago

From hoffman...@gmail.com on May 05, 2011 08:13:03

Thanks, Jay. The tests don't pass yet--I think Avinash already has a fix for this that I am currently incorporating in HEAD.

Status: Fixed