Closed jhammock closed 5 years ago
@jhammock spark has a queue and jenkins has a queue. The former has a UI, but you can't see it, the later has a UI at http://archive.guoda.bio.
On another note - I am working on this tool dwca2parquet that allows you to run the job yourself on the commandline with no waiting in line. If we can convince @mjcollin to install singularity on the jupyterdb servers, you should be able to create parquets using:
$ dwca2parquet [path to meta.xml in hdfs]
@mjcollin curious to hear your thoughts on this.
@jhammock addendum to the queues - the spark queue is the one that indicated "Already reached maximum submission size" . Is this urgent or can we wait until a simpler method (e.g., command line tool) is available?
Assuming the urgency question applies to my ability to run this job at all: I have a user waiting for a fresh data update, but we can sit tight for a couple of weeks.
I'm less clear on the "no waiting" method you describe, @jhpoelen , and how it would affect my life. I presume if a job is not waiting, either it jumped the queue or it's using some other resources...
Install singularity on idb-juptyer1
Just attempting next update of Fresh Data:
jhammock@idb-jupyter1:~$ curl -X POST http://mesos07.acis.ufl.edu:7077/v1/submissions/cr
eate --header "Content-Type:application/json;charset=UTF-8" --data @makeparquet.json
{
"action" : "CreateSubmissionResponse",
"message" : "Already reached maximum submission size",
"serverSparkVersion" : "2.2.0",
"success" : false
}
Not sure what to expect at this point, but it would help me to know what I aught to do in this case.
Perhaps write a little script that keeps trying every 10s or so until the "success" : true
. This is related to #28 .
Or try and convince @mjcollin to install singularity on idb-jupyter1 and run dwca2parquet on the command line, bypassing the queuing mechanism. Another option is to just run spark-shell, run dwca2parquet from the shell . For example see https://github.com/bio-linker/dwca2parquet/blob/master/dwca2parquet.def#L15 where the jar can be retrieved from https://github.com/bio-linker/dwca2parquet/blob/master/dwca2parquet.def#L41 .
From what I can tell, you are competing for resources with http://archive.guoda.bio/job/ecoregion%20status%20checker/ @diatomsRcool , who is using a little script I wrote to automatically submit jobs.
good to know the jobs won't just accumulate if I keep trying, thanks, @jhpoelen !
@diatomsRcool are your jobs still on a 12h on/12h off schedule of some kind?
Probably not - I'm trying to ram those ecoregions through just to get them done. I know you've been waiting forever.
in my latest round of updating resources for fresh data (started ~26 hours ago) I'm having trouble completing a makeparquet job. I have asked for it three times, each attempt separated by at least a few hours. I can't reach the old logs in my browser/terminal interface, but the most recent attempt went like this:
jhammock@idb-jupyter1:~$ curl -X POST http://mesos07.acis.ufl.edu:7077/v1/submissions/create --header "Content-Type:application/json ;charset=UTF-8" --data @makeparquet.json
{
"action" : "CreateSubmissionResponse",
"message" : "Already reached maximum submission size",
"serverSparkVersion" : "2.2.0",
"success" : false
}
I'm pretty sure that on the first attempt, it was success: true. I've checked http://archive.guoda.bio/view/Fresh%20Data%20jobs/ a few times and never seen any sign of this job in the queue, but I'm not sure it's supposed to be visible there.