CVNRneuroimaging / infrastructure

Issue tracking, system documentation and configs for operations side of the neuroimaging core @ Atlanta VA CVNR / Emory University
3 stars 2 forks source link

grid engine under ubuntu 14.04 ? #144

Closed stowler closed 8 years ago

stowler commented 8 years ago

Keith and Rob,

Have you already tested grid engine on your current ubnutu 14.04 hosts? I have some FSL jobs that need within-host parallelization (beyond GNU parallel / ppss / etc.)

If you have already tested it, please point me to your configs (package choice, config file tweaks, etc.), and I'll replicate whatever your vetted solution is on pano and rama. I'm not married to any particular package.

If you haven't already tested, I'll use pano and rama to test fsl against these packages from the standard 14.04 repos:

[09:08:59]-[stowler-local]-at-[rama]-in-[~/src.mywork.gitRepos/brainwhere/utilitiesAndData/testsForFSL] on master [?]
$ sudo apt-get install gridengine-master gridengine-exec gridengine-client gridengine-qmon gridengine-common
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following extra packages will be installed:
  bsd-mailx postfix xfonts-75dpi
Suggested packages:
  procmail postfix-mysql postfix-pgsql postfix-ldap postfix-pcre sasl2-bin
  dovecot-common postfix-cdb postfix-doc
The following NEW packages will be installed:
  bsd-mailx gridengine-client gridengine-common gridengine-exec
  gridengine-master gridengine-qmon postfix xfonts-75dpi
0 upgraded, 8 newly installed, 0 to remove and 0 not upgraded.
Need to get 12.8 MB of archives.
After this operation, 57.3 MB of additional disk space will be used.
Do you want to continue? [Y/n] Y

Thanks, Stephen

@kmcgregor123456 @rrmm

kmcgregor123456 commented 8 years ago

We have not tested it to date. I'm not sure if we have the bandwidth to setup a clusterized compute environment with resources on hand. We will have to back burner that for right now.

Keith McGregor, PhD VA RR&D Atlanta CoE Emory University 352.359.8084 www.varrd.emory.edu


From: Stephen Towler [notifications@github.com] Sent: Monday, August 24, 2015 9:13 AM To: CVNRneuroimaging/infrastructure Cc: Keith McGregor Subject: [infrastructure] have you already tested grid engine under ubuntu 14.04 ? (#144)

Keith and Rob,

Have you already tested grid engine on your current ubnutu 14.04 hosts? I have some FSL jobs that need within-host parallelization (beyond GNU parallel / ppss / etc.)

If you have already tested it, please point me to your configs (package choice, config file tweaks, etc.), and I'll replicate whatever your vetted solution is on pano and rama. I'm not married to any particular package.

If you haven't already tested, I'll use pano and rama to test fsl against these packages from the standard 14.04 repos:

[09:08:59]-[stowler-local]-at-[rama]-in-[~/src.mywork.gitRepos/brainwhere/utilitiesAndData/testsForFSL] on master [?] $ sudo apt-get install gridengine-master gridengine-exec gridengine-client gridengine-qmon gridengine-common Reading package lists... Done Building dependency tree Reading state information... Done The following extra packages will be installed: bsd-mailx postfix xfonts-75dpi Suggested packages: procmail postfix-mysql postfix-pgsql postfix-ldap postfix-pcre sasl2-bin dovecot-common postfix-cdb postfix-doc The following NEW packages will be installed: bsd-mailx gridengine-client gridengine-common gridengine-exec gridengine-master gridengine-qmon postfix xfonts-75dpi 0 upgraded, 8 newly installed, 0 to remove and 0 not upgraded. Need to get 12.8 MB of archives. After this operation, 57.3 MB of additional disk space will be used. Do you want to continue? [Y/n] Y

Thanks, Stephen

@kmcgregor123456https://github.com/kmcgregor123456 @rrmmhttps://github.com/rrmm

— Reply to this email directly or view it on GitHubhttps://github.com/CVNRneuroimaging/infrastructure/issues/144.


This e-mail message (including any attachments) is for the sole use of the intended recipient(s) and may contain confidential and privileged information. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this message (including any attachments) is strictly prohibited.

If you have received this message in error, please contact the sender by reply e-mail message and destroy all copies of the original message (including attachments).

stowler commented 8 years ago

Got it.

(BTW: within-host parallelization is what I wrote on the ticket...I don't need a cluster, I just need to speed up Bruce's FSL jobs on pano and rama as individual hosts...each box can handle 24 threads simultaneously and some of FSL's programs (including group ICA) can be sped up by using all of those threads.

stowler commented 8 years ago

gridengine: configured FSL work-around

Neurodebian on our Ubuntu 14.04 hosts turns out to have broken FSL parallelization. Normally setting these two parameters in /etc/fsl/fsl.sh would enable FSL's fsl_sub to do it's work:

FSLPARALLEL=1
FSLCLUSTER_DEFAULT_QUEUE="mainqueue" #...the name of the sole queue I configured on pano for quick and dirty setup

...but in our environment fsl_sub isn't getting those parameters from fsl.sh for some reason.

Once I realized that was the problem I just hard-coded those values into fsl_sub:

...snip...

 # from Stephen: doesn't look like this is making it from fsl.sh:
  FSLPARALLEL=1
  # Allow to override the above automatic detection result with FSLPARALLEL
  if [ -n "$FSLPARALLEL" ] ; then

...snip...

# from Stephen: looks like this isn't making it from fsl.sh:

  FSLCLUSTER_DEFAULT_QUEUE="mainqueue"
  # SGE should already have a default queue, but allow for overwrite
  queueCmd=""
  if [ "x$FSLCLUSTER_DEFAULT_QUEUE" != "x" ] ; then

...snip...

map_qname ()
  {
     # for Debian we can't do the stuff below, because it would be hard
     # to determine how particular queues are meant to be used on any given
     # system. Instead of translating into a queue name we specify proper
     # resource limits, and let SGE decide what queue matches
     # (qsub wants the time limit in seconds)
     queueCmd="$queueCmd -l h_rt=$(echo "$1 * 60" | bc)"
      if [ $1 -le 20 ] ; then
        #queue=veryshort.q
        queue=mainqueue
      elif [ $1 -le 120 ] ; then
        #queue=short.q
        queue=mainqueue
      elif [ $1 -le 1440 ] ; then
        #queue=long.q
        queue=mainqueue
      else
        #queue=verylong.q
        queue=mainqueue
      fi
      queueCmd=" -q $queue "

      #echo "Estimated time was $1 mins: queue name is $queue"
  }
stowler commented 8 years ago

I haven't reviewed it for quality yet, but this ~12 hr group ICA took only 3 hr using my untuned single-host gridengine deployment from this morning:

Melodic Started at Mon Aug 24 17:10:16 EDT 2015 :
154M    /tmp/melFromFeeds-groupICA-FSLPARALLEL-inputsCleaned-temporalCat-structBBR-standard2mmNonlinear.gica
...but melodic not yet finished as of Mon Aug 24 20:14:03 EDT 2015. Will check again in 20 seconds...

Finished at Mon Aug 24 20:14:18 EDT 2015
kmcgregor123456 commented 8 years ago

Is this the RFA ICA?

Keith McGregor, PhD VA RR&D Atlanta CoE Emory University 352.359.8084 www.varrd.emory.edu


From: Stephen Towler [notifications@github.com] Sent: Monday, August 24, 2015 8:44 PM To: CVNRneuroimaging/infrastructure Cc: Keith McGregor Subject: Re: [infrastructure] grid engine under ubuntu 14.04 ? (#144)

I haven't reviewed it for quality yet, but this ~12 hr group ICA took only 3 hr using my untuned single-host gridengine deployment from this morning:

Melodic Started at Mon Aug 24 17:10:16 EDT 2015 : 154M /tmp/melFromFeeds-groupICA-FSLPARALLEL-inputsCleaned-temporalCat-structBBR-standard2mmNonlinear.gica ...but melodic not yet finished as of Mon Aug 24 20:14:03 EDT 2015. Will check again in 20 seconds...

Finished at Mon Aug 24 20:14:18 EDT 2015

— Reply to this email directly or view it on GitHubhttps://github.com/CVNRneuroimaging/infrastructure/issues/144#issuecomment-134427346.


This e-mail message (including any attachments) is for the sole use of the intended recipient(s) and may contain confidential and privileged information. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this message (including any attachments) is strictly prohibited.

If you have received this message in error, please contact the sender by reply e-mail message and destroy all copies of the original message (including attachments).