bio-guoda / guoda-services

Services provided by GUODA, currently a container for tickets and wikis.
MIT License
2 stars 0 forks source link

no cluster capacity? #42

Closed jhammock closed 6 years ago

jhammock commented 6 years ago

I've been trying a makeparquet job for a few days now and been declined each time. Is the cluster choking on something?

jhpoelen commented 6 years ago

As far as I can tell, the cluster is eagerly munching on checklists (see attached screenshot). Due to a lack of queuing mechanism (see #28), your jobs just get rejected unless none is running. Since is pretty successful attempting at scheduling jobs, you probably ended up getting the short and of the stick. As far as the checklists go - 324 calculated, 7700 to go.

Bottomline - a queuing mechanism would be nice to have. Meanwhile, as a workaround, you can ask @diatomsRcool to temporarily pause the job (or do it yourself), wait for about 10-20 minutes or so, then run the parquet conversion before then resume the checklist job.

screenshot from 2018-01-22 10-21-40

jhammock commented 6 years ago

Good to know, thanks. I'll coordinate with @diatomsRcool


jhpoelen commented 6 years ago

I get logs through ssh account I got via @mjcollin using socks tunnel (see .

An alternate method is to start a terminal in jupyter and then run

curl | python -m json.tool | more

This gives you a paged list of tasks (running and finished). Press space bar to scroll down.

Maybe @godfoder or @mjcollin have some other ideas to help you peek into the cluster.

jhammock commented 6 years ago

Oh, nice. Peeking in jupyter works nicely for my occasional needs, thanks!