CMSCompOps / WorkflowWebTools

https://workflowwebtools.readthedocs.io
1 stars 7 forks source link

Switch to CMSMonitoring StompAMQ #130

Closed vkuznet closed 5 years ago

vkuznet commented 5 years ago

Please start using CMSMonitoring StompAMQ module instead of WMCore and your own one. The CMSMonitoring module satisfies all requirements CERN IT asks us to implement, e.g. is provide load balancing among CERN brokers, it provide schema validation, etc.

I made this PR to replace WMCore and you own versions of StompAMQ and replace them with CMSMonitoring one. The setup.py was also changed to include CMSMonitoring module as dependencies.

Please test the changes and incorporate them in your tree.

phylsix commented 5 years ago

Hi @vkuznet , thanks for PR, I was not aware of CMSMonitoring package before. However, it seems pip install CMSMonitoring does not work for me, is it working for you?

(toolsint) > $ pip install CMSMonitoring                                                                                                                                       [±CMSMonitoringSwitch ●●]
Collecting CMSMonitoring
  Using cached https://files.pythonhosted.org/packages/b2/41/f77ee8d73a56b6e61978d8cd53c4054fec5efb5ccd55dc9efc83dd452f89/CMSMonitoring-0.1.1.tar.gz
    ERROR: Complete output from command python setup.py egg_info:
    ERROR: fatal: Not a git repository (or any of the parent directories): .git
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/wsi/pip-install-x6lf3lq8/CMSMonitoring/setup.py", line 62, in <module>
        main()
      File "/tmp/wsi/pip-install-x6lf3lq8/CMSMonitoring/setup.py", line 40, in main
        version              = version(),
      File "/tmp/wsi/pip-install-x6lf3lq8/CMSMonitoring/setup.py", line 31, in version
        ver = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE).stdout.read().replace('\n', '')
    TypeError: a bytes-like object is required, not 'str'
    ----------------------------------------
ERROR: Command "python setup.py egg_info" failed with error code 1 in /tmp/wsi/pip-install-x6lf3lq8/CMSMonitoring/
vkuznet commented 5 years ago

Weinan, last time I checked setup.py it was working. I think from error message it looks like you used python3 rather then python2 and that cause a problem. Anyway, I fixed setup.py to use str. Do you mind to try out version 0.1.2 which is not available on pypi.

Let me know if you need this in PR. Valentin.

On 0, Weinan Si notifications@github.com wrote:

Hi @vkuznet , thanks for PR, I was not aware of CMSMonitoring package before. However, it seems pip install CMSMonitoring does not work for me, is it working for you?

(toolsint) > $ pip install CMSMonitoring                                                                                                                                       [±CMSMonitoringSwitch ●●]
Collecting CMSMonitoring
  Using cached https://files.pythonhosted.org/packages/b2/41/f77ee8d73a56b6e61978d8cd53c4054fec5efb5ccd55dc9efc83dd452f89/CMSMonitoring-0.1.1.tar.gz
    ERROR: Complete output from command python setup.py egg_info:
    ERROR: fatal: Not a git repository (or any of the parent directories): .git
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/wsi/pip-install-x6lf3lq8/CMSMonitoring/setup.py", line 62, in <module>
        main()
      File "/tmp/wsi/pip-install-x6lf3lq8/CMSMonitoring/setup.py", line 40, in main
        version              = version(),
      File "/tmp/wsi/pip-install-x6lf3lq8/CMSMonitoring/setup.py", line 31, in version
        ver = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE).stdout.read().replace('\n', '')
    TypeError: a bytes-like object is required, not 'str'
    ----------------------------------------
ERROR: Command "python setup.py egg_info" failed with error code 1 in /tmp/wsi/pip-install-x6lf3lq8/CMSMonitoring/

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/CMSCompOps/WorkflowWebTools/pull/130#issuecomment-496967125

phylsix commented 5 years ago

@vkuznet Yes, I was testing with py37. After your fix, the installation is ok, it would be good if you can make it available on PyPI. Meanwhile, I met a logging type error when testing,

TypeError: not all arguments converted during string formatting
Call stack:
  File "sendToMonit.py", line 312, in <module>
    main()
  File "sendToMonit.py", line 300, in main
    failures = sendDoc(cred=cred, docs=docs)
  File "sendToMonit.py", line 256, in sendDoc
    key=cred['key']
  File "/afs/cern.ch/user/w/wsi/Workspace/private/miniconda3/envs/toolsint/lib/python3.7/site-packages/CMSMonitoring/StompAMQ.py", line 146, in __init__
    self.logger.info("host and ports", host_and_ports, type(host_and_ports))
Message: 'host and ports'
Arguments: ([('cms-mb.cern.ch', 61323)], <class 'list'>)
--- Logging error ---
Traceback (most recent call last):
  File "/afs/cern.ch/user/w/wsi/Workspace/private/miniconda3/envs/toolsint/lib/python3.7/logging/__init__.py", line 1034, in emit
    msg = self.format(record)
  File "/afs/cern.ch/user/w/wsi/Workspace/private/miniconda3/envs/toolsint/lib/python3.7/logging/__init__.py", line 880, in format
    return fmt.format(record)
  File "/afs/cern.ch/user/w/wsi/Workspace/private/miniconda3/envs/toolsint/lib/python3.7/logging/__init__.py", line 619, in format
    record.message = record.getMessage()
  File "/afs/cern.ch/user/w/wsi/Workspace/private/miniconda3/envs/toolsint/lib/python3.7/logging/__init__.py", line 380, in getMessage
    msg = msg % self.args
TypeError: not all arguments converted during string formatting
Call stack:
  File "sendToMonit.py", line 312, in <module>
    main()
  File "sendToMonit.py", line 300, in main
    failures = sendDoc(cred=cred, docs=docs)
  File "sendToMonit.py", line 256, in sendDoc
    key=cred['key']
  File "/afs/cern.ch/user/w/wsi/Workspace/private/miniconda3/envs/toolsint/lib/python3.7/site-packages/CMSMonitoring/StompAMQ.py", line 152, in __init__
    self.logger.info("### resolver", self.ip_and_ports)
Message: '### resolver'

I think it relates with here and here. Maybe a string format before passing to logger would fix?

vkuznet commented 5 years ago

Sorry, it was purely my mistake. I tested code with print statements but then converted print to logger without check. I fixed the issue.

The new version CMSMonitoring is 0.1.3 and I put it in PR. It should be available on PyPi. Please give it a shot now.

On 0, Weinan Si notifications@github.com wrote:

@vkuznet Yes, I was testing with py37. After your fix, the installation is ok, it would be good if you can make it available on PyPI. Meanwhile, I met a logging type error when testing,

TypeError: not all arguments converted during string formatting
Call stack:
  File "sendToMonit.py", line 312, in <module>
    main()
  File "sendToMonit.py", line 300, in main
    failures = sendDoc(cred=cred, docs=docs)
  File "sendToMonit.py", line 256, in sendDoc
    key=cred['key']
  File "/afs/cern.ch/user/w/wsi/Workspace/private/miniconda3/envs/toolsint/lib/python3.7/site-packages/CMSMonitoring/StompAMQ.py", line 146, in __init__
    self.logger.info("host and ports", host_and_ports, type(host_and_ports))
Message: 'host and ports'
Arguments: ([('cms-mb.cern.ch', 61323)], <class 'list'>)
--- Logging error ---
Traceback (most recent call last):
  File "/afs/cern.ch/user/w/wsi/Workspace/private/miniconda3/envs/toolsint/lib/python3.7/logging/__init__.py", line 1034, in emit
    msg = self.format(record)
  File "/afs/cern.ch/user/w/wsi/Workspace/private/miniconda3/envs/toolsint/lib/python3.7/logging/__init__.py", line 880, in format
    return fmt.format(record)
  File "/afs/cern.ch/user/w/wsi/Workspace/private/miniconda3/envs/toolsint/lib/python3.7/logging/__init__.py", line 619, in format
    record.message = record.getMessage()
  File "/afs/cern.ch/user/w/wsi/Workspace/private/miniconda3/envs/toolsint/lib/python3.7/logging/__init__.py", line 380, in getMessage
    msg = msg % self.args
TypeError: not all arguments converted during string formatting
Call stack:
  File "sendToMonit.py", line 312, in <module>
    main()
  File "sendToMonit.py", line 300, in main
    failures = sendDoc(cred=cred, docs=docs)
  File "sendToMonit.py", line 256, in sendDoc
    key=cred['key']
  File "/afs/cern.ch/user/w/wsi/Workspace/private/miniconda3/envs/toolsint/lib/python3.7/site-packages/CMSMonitoring/StompAMQ.py", line 152, in __init__
    self.logger.info("### resolver", self.ip_and_ports)
Message: '### resolver'

I think it relates with here and here. Maybe a string format before passing to logger would fix?

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/CMSCompOps/WorkflowWebTools/pull/130#issuecomment-496993118

vkuznet commented 5 years ago

Weinan, I didn't realized that you use travis-ci, otherwise I'll check errors on my own. Anyway, after couple of iteration I see that you have 3.7 build successful, while 2.7 build fails due to numpy issue.

Feel free to have a look at them and proceed with testing new changes.

Best, Valentin.

On 0, Weinan Si notifications@github.com wrote:

@vkuznet Yes, I was testing with py37. After your fix, the installation is ok, it would be good if you can make it available on PyPI. Meanwhile, I met a logging type error when testing,

TypeError: not all arguments converted during string formatting
Call stack:
  File "sendToMonit.py", line 312, in <module>
    main()
  File "sendToMonit.py", line 300, in main
    failures = sendDoc(cred=cred, docs=docs)
  File "sendToMonit.py", line 256, in sendDoc
    key=cred['key']
  File "/afs/cern.ch/user/w/wsi/Workspace/private/miniconda3/envs/toolsint/lib/python3.7/site-packages/CMSMonitoring/StompAMQ.py", line 146, in __init__
    self.logger.info("host and ports", host_and_ports, type(host_and_ports))
Message: 'host and ports'
Arguments: ([('cms-mb.cern.ch', 61323)], <class 'list'>)
--- Logging error ---
Traceback (most recent call last):
  File "/afs/cern.ch/user/w/wsi/Workspace/private/miniconda3/envs/toolsint/lib/python3.7/logging/__init__.py", line 1034, in emit
    msg = self.format(record)
  File "/afs/cern.ch/user/w/wsi/Workspace/private/miniconda3/envs/toolsint/lib/python3.7/logging/__init__.py", line 880, in format
    return fmt.format(record)
  File "/afs/cern.ch/user/w/wsi/Workspace/private/miniconda3/envs/toolsint/lib/python3.7/logging/__init__.py", line 619, in format
    record.message = record.getMessage()
  File "/afs/cern.ch/user/w/wsi/Workspace/private/miniconda3/envs/toolsint/lib/python3.7/logging/__init__.py", line 380, in getMessage
    msg = msg % self.args
TypeError: not all arguments converted during string formatting
Call stack:
  File "sendToMonit.py", line 312, in <module>
    main()
  File "sendToMonit.py", line 300, in main
    failures = sendDoc(cred=cred, docs=docs)
  File "sendToMonit.py", line 256, in sendDoc
    key=cred['key']
  File "/afs/cern.ch/user/w/wsi/Workspace/private/miniconda3/envs/toolsint/lib/python3.7/site-packages/CMSMonitoring/StompAMQ.py", line 152, in __init__
    self.logger.info("### resolver", self.ip_and_ports)
Message: '### resolver'

I think it relates with here and here. Maybe a string format before passing to logger would fix?

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/CMSCompOps/WorkflowWebTools/pull/130#issuecomment-496993118

dabercro commented 5 years ago

It looks like jaraco.functools requires an old numpy for some reason. We'll need to figure out where that's coming from and see if we should remove that or freeze tensorflow at version 1.13.

dabercro commented 5 years ago

Nevermind, that doesn't seem to be where that's coming from. Sorry for the spam.

dabercro commented 5 years ago

It looks like using pip instead of python setup.py install fixed the numpy versioning. I'll merge #131 in a minute and see if that fixes the problems we see here.

phylsix commented 5 years ago

@vkuznet the scripts under workflowmonitor actually does not include tests, Travis CI might not catch the errors(probably I need to figure out tests..) So I test by hand for now. Anyway, I noticed another error I don't understand: the new StompAMQ always give me connection error:

ERROR:root:Connection to [('cms-mb.cern.ch', 61323)] failed
ERROR:root:Connection to [('cms-mb.cern.ch', 61323)] failed
ERROR:root:Connection to [('cms-mb.cern.ch', 61323)] failed
ERROR:root:Connection to [('cms-mb.cern.ch', 61323)] failed
ERROR:root:Connection to [('cms-mb.cern.ch', 61323)] failed
ERROR:root:Connection to [('cms-mb.cern.ch', 61323)] failed

break down a little bit, following exceptions raised:

    conn.connect(wait=True)
  File "/afs/cern.ch/user/w/wsi/Workspace/private/miniconda3/envs/toolsint/lib/python3.7/site-packages/stomp/connect.py", line 164, in connect
    Protocol11.connect(self, *args, **kwargs)
  File "/afs/cern.ch/user/w/wsi/Workspace/private/miniconda3/envs/toolsint/lib/python3.7/site-packages/stomp/protocol.py", line 340, in connect
    self.transport.wait_for_connection()
  File "/afs/cern.ch/user/w/wsi/Workspace/private/miniconda3/envs/toolsint/lib/python3.7/site-packages/stomp/transport.py", line 327, in wait_for_connection
    raise exception.ConnectFailedException()
stomp.exception.ConnectFailedException

Comparing with the old module, I see you build a list of connections while the former module build a single one. The old one still works, is the new module working fine for you?

vkuznet commented 5 years ago

When I tested new code it was working fine. I can redo tests tomorrow.

The whole point of new code is to resolve broker host name into individual IPs and perform random access to brokers. CERN IT strongly advised us to avoid using single connection since it leads to plenty of problems on their end.

From the error you posted it looks like the host resolution didn't happens (may be I need to see how it is done in python3). Do you mind to run it with python2.

For completeness, here is how it is done, see lines: https://github.com/dmwm/CMSMonitoring/blob/master/src/python/CMSMonitoring/StompAMQ.py#L147 https://github.com/dmwm/CMSMonitoring/blob/master/src/python/CMSMonitoring/StompAMQ.py#L98 and, can you post the logger info messages, in particular we should see host and ports: ... something here ... resolver: ... something here ...

Thanks, Valentin

On 0, Weinan Si notifications@github.com wrote:

@vkuznet the scripts under workflowmonitor actually does not include tests, Travis CI might not catch the errors(probably I need to figure out tests..) So I test by hand for now. Anyway, I noticed another error I don't understand: the new StompAMQ always give me connection error:

ERROR:root:Connection to [('cms-mb.cern.ch', 61323)] failed
ERROR:root:Connection to [('cms-mb.cern.ch', 61323)] failed
ERROR:root:Connection to [('cms-mb.cern.ch', 61323)] failed
ERROR:root:Connection to [('cms-mb.cern.ch', 61323)] failed
ERROR:root:Connection to [('cms-mb.cern.ch', 61323)] failed
ERROR:root:Connection to [('cms-mb.cern.ch', 61323)] failed

break down a little bit, following exceptions raised:

    conn.connect(wait=True)
  File "/afs/cern.ch/user/w/wsi/Workspace/private/miniconda3/envs/toolsint/lib/python3.7/site-packages/stomp/connect.py", line 164, in connect
    Protocol11.connect(self, *args, **kwargs)
  File "/afs/cern.ch/user/w/wsi/Workspace/private/miniconda3/envs/toolsint/lib/python3.7/site-packages/stomp/protocol.py", line 340, in connect
    self.transport.wait_for_connection()
  File "/afs/cern.ch/user/w/wsi/Workspace/private/miniconda3/envs/toolsint/lib/python3.7/site-packages/stomp/transport.py", line 327, in wait_for_connection
    raise exception.ConnectFailedException()
stomp.exception.ConnectFailedException

Comparing with the old module, I see you build a list of connections while the former module build a single one. The old one still works, is the new module working fine for you?

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/CMSCompOps/WorkflowWebTools/pull/130#issuecomment-497082559

phylsix commented 5 years ago

@vkuznet This is the relevant log

2019-05-30 00:32:42,380 - INFO - StompAMQ.__init__ - host and ports: [('cms-mb.cern.ch', 61323)]
2019-05-30 00:32:42,381 - INFO - StompAMQ.__init__ - resolver: [('2001:1458:d00:19::a6', 61323), ('2001:1458:d00:16::2a1', 61323), ('2001:1458:d00:13::31', 61323), ('137.138.122.170', 61323), ('137.138.152.151', 61323), ('137.138.31.103', 61323)]
2019-05-30 00:32:42,390 - ERROR - StompAMQ.connect - Connection to [('cms-mb.cern.ch', 61323)] failed
2019-05-30 00:32:42,392 - ERROR - StompAMQ.connect - Connection to [('cms-mb.cern.ch', 61323)] failed
2019-05-30 00:32:42,398 - ERROR - StompAMQ.connect - Connection to [('cms-mb.cern.ch', 61323)] failed
2019-05-30 00:32:42,404 - ERROR - StompAMQ.connect - Connection to [('cms-mb.cern.ch', 61323)] failed
2019-05-30 00:32:42,405 - ERROR - StompAMQ.connect - Connection to [('cms-mb.cern.ch', 61323)] failed
2019-05-30 00:32:42,407 - ERROR - StompAMQ.connect - Connection to [('cms-mb.cern.ch', 61323)] failed

I think I know where the problem is:

diff --git a/src/python/CMSMonitoring/StompAMQ.py b/src/python/CMSMonitoring/StompAMQ.py
index 13d6f4b..142053d 100644
--- a/src/python/CMSMonitoring/StompAMQ.py
+++ b/src/python/CMSMonitoring/StompAMQ.py
@@ -170,6 +170,10 @@ class StompAMQ(object):
             for idx in range(len(self.ip_and_ports)):
                 host_and_ports = [self.ip_and_ports[idx]]
                 conn = stomp.Connection(host_and_ports=host_and_ports)
+                # perform connection to the broker
+                if self._use_ssl:
+                    # This requires stomp >= 4.1.15
+                    conn.set_ssl(for_hosts=host_and_ports, key_file=self._key, cert_file=self._cert)
                 self.connections.append(conn)
         else:
             conn = stomp.Connection(host_and_ports=self._host_and_ports)
@@ -186,10 +190,6 @@ class StompAMQ(object):
                 available_connections.append(conn)
                 continue

-            # perform connection to the broker
-            if self._use_ssl:
-                # This requires stomp >= 4.1.15
-                conn.set_ssl(for_hosts=self._host_and_ports, key_file=self._key, cert_file=self._cert)
             conn.set_listener('StompyListener', StompyListener(self.logger))
             # check if our connection failed before
             # if so we'll wait until timeout_interval is passed

It seems set_ssl's for_hosts need to change accordingly, with this, I don't see connection error any more.

Besides, make_notification now returns a tuple, changes need to be make here too, with this I everything is fine.

         doctype = 'workflowmonit_{}'.format(cred['producer'])
         notifications = [amq.make_notification(
-            payload=doc, docType=doctype) for doc in docs]
+            payload=doc, docType=doctype)[0] for doc in docs]
         failures = amq.send(notifications)
dabercro commented 5 years ago

I'm just going to quickly close and reopen this PR so that Travis builds again with the new master branch (I hope).

phylsix commented 5 years ago

@dabercro You can merge this PR, I can fix other problems in the scripts, and submit another PR. Thanks!

vkuznet commented 5 years ago

Weinan, you're right, I only tested user/password settings but not certs. I made changes as you suggested and uploaded code to pypi. Please try again and let me know. Valentin.

On 0, Weinan Si notifications@github.com wrote:

@vkuznet This is the relevant log

2019-05-30 00:32:42,380 - INFO - StompAMQ.__init__ - host and ports: [('cms-mb.cern.ch', 61323)]
2019-05-30 00:32:42,381 - INFO - StompAMQ.__init__ - resolver: [('2001:1458:d00:19::a6', 61323), ('2001:1458:d00:16::2a1', 61323), ('2001:1458:d00:13::31', 61323), ('137.138.122.170', 61323), ('137.138.152.151', 61323), ('137.138.31.103', 61323)]
2019-05-30 00:32:42,390 - ERROR - StompAMQ.connect - Connection to [('cms-mb.cern.ch', 61323)] failed
2019-05-30 00:32:42,392 - ERROR - StompAMQ.connect - Connection to [('cms-mb.cern.ch', 61323)] failed
2019-05-30 00:32:42,398 - ERROR - StompAMQ.connect - Connection to [('cms-mb.cern.ch', 61323)] failed
2019-05-30 00:32:42,404 - ERROR - StompAMQ.connect - Connection to [('cms-mb.cern.ch', 61323)] failed
2019-05-30 00:32:42,405 - ERROR - StompAMQ.connect - Connection to [('cms-mb.cern.ch', 61323)] failed
2019-05-30 00:32:42,407 - ERROR - StompAMQ.connect - Connection to [('cms-mb.cern.ch', 61323)] failed

I think I know where the problem is:

diff --git a/src/python/CMSMonitoring/StompAMQ.py b/src/python/CMSMonitoring/StompAMQ.py
index 13d6f4b..142053d 100644
--- a/src/python/CMSMonitoring/StompAMQ.py
+++ b/src/python/CMSMonitoring/StompAMQ.py
@@ -170,6 +170,10 @@ class StompAMQ(object):
             for idx in range(len(self.ip_and_ports)):
                 host_and_ports = [self.ip_and_ports[idx]]
                 conn = stomp.Connection(host_and_ports=host_and_ports)
+                # perform connection to the broker
+                if self._use_ssl:
+                    # This requires stomp >= 4.1.15
+                    conn.set_ssl(for_hosts=host_and_ports, key_file=self._key, cert_file=self._cert)
                 self.connections.append(conn)
         else:
             conn = stomp.Connection(host_and_ports=self._host_and_ports)
@@ -186,10 +190,6 @@ class StompAMQ(object):
                 available_connections.append(conn)
                 continue

-            # perform connection to the broker
-            if self._use_ssl:
-                # This requires stomp >= 4.1.15
-                conn.set_ssl(for_hosts=self._host_and_ports, key_file=self._key, cert_file=self._cert)
             conn.set_listener('StompyListener', StompyListener(self.logger))
             # check if our connection failed before
             # if so we'll wait until timeout_interval is passed

It seems set_ssl's for_hosts need to change accordingly, with this, I don't see connection error any more.

Besides, make_notification now returns a tuple, changes need to be make here too, with this I everything is fine.

         doctype = 'workflowmonit_{}'.format(cred['producer'])
         notifications = [amq.make_notification(
-            payload=doc, docType=doctype) for doc in docs]
+            payload=doc, docType=doctype)[0] for doc in docs]
         failures = amq.send(notifications)

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/CMSCompOps/WorkflowWebTools/pull/130#issuecomment-497142902

phylsix commented 5 years ago

@vkuznet I tried the new version on PyPI, v0.1.8 I believe. The connection problem is resolved. One small suggestion maybe for version inspection?

(toolsint) > $ python -c "import CMSMonitoring; print(CMSMonitoring.__version__)"                                                                                                           [±pr/130 ●●]
Traceback (most recent call last):
  File "<string>", line 1, in <module>
AttributeError: module 'CMSMonitoring' has no attribute '__version__'
vkuznet commented 5 years ago

Weinan, you may try now, I added version support in 0.1.9 version. Best, Valentin.

On 0, Weinan Si notifications@github.com wrote:

@vkuznet I tried the new version on PyPI, v0.1.8 I believe. The connection problem is resolved. One small suggestion maybe for version inspection?

(toolsint) > $ python -c "import CMSMonitoring; print(CMSMonitoring.__version__)"                                                                                                           [±pr/130 ●●]
Traceback (most recent call last):
  File "<string>", line 1, in <module>
AttributeError: module 'CMSMonitoring' has no attribute '__version__'

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/CMSCompOps/WorkflowWebTools/pull/130#issuecomment-497733987