bio-guoda / guoda-services

Services provided by GUODA, currently a container for tickets and wikis.
MIT License
2 stars 0 forks source link

no cluster capacity? #42

Closed jhammock closed 6 years ago

jhammock commented 6 years ago

I've been trying a makeparquet job for a few days now and been declined each time. Is the cluster choking on something?

jhpoelen commented 6 years ago

As far as I can tell, the cluster is eagerly munching on checklists (see attached screenshot). Due to a lack of queuing mechanism (see #28), your jobs just get rejected unless none is running. Since http://archive.guoda.bio/job/ecoregion%20status%20checker/ is pretty successful attempting at scheduling jobs, you probably ended up getting the short and of the stick. As far as the checklists go - 324 calculated, 7700 to go.

Bottomline - a queuing mechanism would be nice to have. Meanwhile, as a workaround, you can ask @diatomsRcool to temporarily pause the job (or do it yourself), wait for about 10-20 minutes or so, then run the parquet conversion before then resume the checklist job.

screenshot from 2018-01-22 10-21-40

jhammock commented 6 years ago

Good to know, thanks. I'll coordinate with @diatomsRcool

wishIcouldseethoselogs

jhpoelen commented 6 years ago

I get logs through ssh account I got via @mjcollin using socks tunnel (see https://www.digitalocean.com/community/tutorials/how-to-route-web-traffic-securely-without-a-vpn-using-a-socks-tunnel) .

An alternate method is to start a terminal in jupyter and then run

curl http://mesos02.acis.ufl.edu:5050/tasks | python -m json.tool | more

This gives you a paged list of tasks (running and finished). Press space bar to scroll down.

Maybe @godfoder or @mjcollin have some other ideas to help you peek into the cluster.

jhammock commented 6 years ago

Oh, nice. Peeking in jupyter works nicely for my occasional needs, thanks!