bio-guoda / guoda-services

Services provided by GUODA, currently a container for tickets and wikis.
MIT License
2 stars 0 forks source link

possible mesos problem building checklists #74

Closed diatomsRcool closed 5 years ago

diatomsRcool commented 5 years ago

This is what I'm getting. I'm using jupyter.idigbio.org and opening a terminal. I have cloned the preston-scripts repo to my jupyter.idigbio environment. I cd into preston-scripts and run the checklist script.

19/08/07 13:24:56 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/08/07 13:24:57 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
2019-08-07 13:24:57,529:16551(0x7f1fa330d700):ZOO_INFO@log_env@726: Client environment:zookeeper.version=zookeeper C client 3.4.8
2019-08-07 13:24:57,529:16551(0x7f1fa330d700):ZOO_INFO@log_env@730: Client environment:host.name=idb-jupyter1
2019-08-07 13:24:57,529:16551(0x7f1fa330d700):ZOO_INFO@log_env@737: Client environment:os.name=Linux
2019-08-07 13:24:57,529:16551(0x7f1fa330d700):ZOO_INFO@log_env@738: Client environment:os.arch=4.4.0-116-generic
I0807 13:24:57.530035 16629 sched.cpp:226] Version: 1.0.0
2019-08-07 13:24:57,530:16551(0x7f1fa330d700):ZOO_INFO@log_env@739: Client environment:os.version=#140-Ubuntu SMP Mon Feb 12 21:23:0 4 UTC 2018
2019-08-07 13:24:57,531:16551(0x7f1fa330d700):ZOO_INFO@log_env@747: Client environment:user.name=(null)
2019-08-07 13:24:57,532:16551(0x7f1fa330d700):ZOO_INFO@log_env@755: Client environment:user.home=/home/diatomsrcool
2019-08-07 13:24:57,533:16551(0x7f1fa330d700):ZOO_INFO@log_env@767: Client environment:user.dir=/home/diatomsrcool/preston-scripts
2019-08-07 13:24:57,533:16551(0x7f1fa330d700):ZOO_INFO@zookeeper_init@800: Initiating client connection, host=mesos01:2181,mesos02:2 181,mesos03:2181 sessionTimeout=10000 watcher=0x7f1faeff9400 sessionId=0 sessionPasswd= context=0x7f1fe40037a8 flags=0
2019-08-07 13:24:57,540:16551(0x7f1f9f6fc700):ZOO_INFO@check_events@1728: initiated connection to server [10.13.44.15:2181]
2019-08-07 13:24:57,573:16551(0x7f1f9f6fc700):ZOO_INFO@check_events@1775: session establishment complete on server [10.13.44.15:2181 ], sessionId=0x16972fdc35e621b, negotiated timeout=10000
I0807 13:24:57.574261 16621 group.cpp:349] Group process (group(1)@10.13.44.50:45224) connected to ZooKeeper
I0807 13:24:57.574542 16621 group.cpp:837] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
I0807 13:24:57.574568 16621 group.cpp:427] Trying to create path '/mesos' in ZooKeeper
I0807 13:24:57.576110 16621 detector.cpp:152] Detected a new leader: (id='1777')
I0807 13:24:57.576243 16621 group.cpp:706] Trying to get '/mesos/json.info_0000001777' in ZooKeeper
I0807 13:24:57.577112 16622 zookeeper.cpp:259] A new leading master (UPID=master@10.13.44.18:5050) is detected
I0807 13:24:57.577191 16622 sched.cpp:330] New master detected at master@10.13.44.18:5050
I0807 13:24:57.577677 16622 sched.cpp:341] No credentials provided. Attempting to register without authentication
I0807 13:24:57.579376 16622 sched.cpp:743] Framework registered with c353bf9b-2610-43ad-b848-38830dae4dc0-0108
19/08/07 13:25:23 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers a re registered and have sufficient resources
19/08/07 13:25:38 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers a re registered and have sufficient resources
19/08/07 13:25:53 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers a re registered and have sufficient resources 19/08/07 13:26:08 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers a re registered and have sufficient resources
19/08/07 13:26:23 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers a re registered and have sufficient resources 19/08/07 13:26:38 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers a re registered and have sufficient resources
19/08/07 13:26:53 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers a re registered and have sufficient resources
19/08/07 13:27:08 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers a re registered and have sufficient resources
19/08/07 13:27:23 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers a re registered and have sufficient resources
19/08/07 13:27:38 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers a re registered and have sufficient resources
19/08/07 13:27:53 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers a re registered and have sufficient resources
19/08/07 13:28:08 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers a re registered and have sufficient resources
19/08/07 13:28:23 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers a re registered and have sufficient resources
19/08/07 13:28:38 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers a re registered and have sufficient resources
19/08/07 13:28:53 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers a re registered and have sufficient resources
19/08/07 13:29:08 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers a re registered and have sufficient resources
19/08/07 13:29:23 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers a re registered and have sufficient resources
19/08/07 13:29:38 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers a re registered and have sufficient resources
19/08/07 13:29:53 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers a re registered and have sufficient resources
19/08/07 13:30:08 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers a re registered and have sufficient resources
19/08/07 13:30:23 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers a re registered and have sufficient resources
19/08/07 13:30:38 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers a re registered and have sufficient resources
19/08/07 13:30:53 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers a re registered and have sufficient resources 19/08/07 13:31:08 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers a re registered and have sufficient resources
19/08/07 13:31:23 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers a re registered and have sufficient resources
19/08/07 13:31:38 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers a re registered and have sufficient resources
19/08/07 13:31:53 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers a re registered and have sufficient resources
19/08/07 13:32:08 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers a re registered and have sufficient resources
19/08/07 13:32:23 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers a re registered and have sufficient resources
19/08/07 13:32:38 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers a re registered and have sufficient resources
19/08/07 13:32:53 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers a re registered and have sufficient resources
19/08/07 13:33:08 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers a re registered and have sufficient resources
19/08/07 13:33:23 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers a re registered and have sufficient resources
19/08/07 13:33:38 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers a re registered and have sufficient resources
19/08/07 13:33:53 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers a re registered and have sufficient resources
19/08/07 13:34:08 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers a re registered and have sufficient resources
19/08/07 13:34:23 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers a re registered and have sufficient resources
19/08/07 13:34:38 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers a re registered and have sufficient resources
19/08/07 13:34:53 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers a re registered and have sufficient resources
19/08/07 13:35:08 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers a re registered and have sufficient resources
19/08/07 13:35:23 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers a re registered and have sufficient resources
19/08/07 13:35:38 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers a re registered and have sufficient resources
19/08/07 13:35:53 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers a re registered and have sufficient resources
19/08/07 13:36:08 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers a re registered and have sufficient resources
19/08/07 13:36:23 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers a re registered and have sufficient resources
19/08/07 13:36:38 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers a re registered and have sufficient resources
19/08/07 13:36:53 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers a re registered and have sufficient resources
19/08/07 13:37:08 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers a re registered and have sufficient resources
19/08/07 13:37:23 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers a re registered and have sufficient resources
19/08/07 13:37:38 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers a re registered and have sufficient resources
19/08/07 13:37:53 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers a re registered and have sufficient resources

diatomsRcool commented 5 years ago

It did more....

19/08/07 13:58:22 WARN MesosCoarseGrainedSchedulerBackend: Unable to parse into a key:value label for the task.
19/08/07 13:58:23 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers a re registered and have sufficient resources
[Stage 2:===> (156 + 16) / 4157] 2019-08-07 14:01:04,166:16551(0x7f1f9f6fc700):ZOO_ERROR@handle_socket_error_msg@1666: Socket [10.13.44.15:2181] zk retcode=-7, errno =110(Connection timed out): connection to 10.13.44.15:2181 timed out (exceeded timeout by 3ms)
I0807 14:01:04.166496 16620 group.cpp:460] Lost connection to ZooKeeper, attempting to reconnect ...
2019-08-07 14:01:04,166:16551(0x7f1f9f6fc700):ZOO_INFO@check_events@1728: initiated connection to server [10.13.44.18:2181]
2019-08-07 14:01:04,168:16551(0x7f1f9f6fc700):ZOO_INFO@check_events@1775: session establishment complete on server [10.13.44.18:2181 ], sessionId=0x16972fdc35e621b, negotiated timeout=10000
I0807 14:01:04.168541 16621 group.cpp:349] Group process (group(1)@10.13.44.50:45224) reconnected to ZooKeeper
I0807 14:01:04.168587 16621 group.cpp:837] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)

jhpoelen commented 5 years ago

hey @diatomsRcool - bummer! Looks like someone(s) was using all the resources in the cluster for other things when you tried to process, preventing your checklist from being executed. If I am correct, you can just keep the process running until the resources become available - the messages are more informational rather than errors.

Currently, the output logging is quite verbose and also includes info and warning messages. Would it be an idea to reduce the log levels and only log errors?

diatomsRcool commented 5 years ago

It would appear that more clear and concise logs would be beneficial. I would not have started an issue if I had known this was normal. I'm not sure what the answer is, tho.

jhpoelen commented 5 years ago

Ok, closing issue, and opening a new #75 . Meanwhile, please holler if you fail to generate checklists using the scripts in https://github.com/bio-guoda/preston-scripts .