NDCMS / lobster

A userspace workflow management tool for harnessing non-dedicated resources for high-throughput workloads.
MIT License
3 stars 14 forks source link

lobster crashes in CMSSW_9_4_0 #623

Closed geoff-smith closed 6 years ago

geoff-smith commented 6 years ago

I followed the instructions for installing lobster from source, after doing a cmsenv in a CMSSW_9_4_0 release. The installation appeared to complete successfully, but when I try to type 'lobster', I get the following complaint from python:

Traceback (most recent call last): File "/afs/crc.nd.edu/user/g/gsmith15/.lobster/bin/lobster", line 11, in load_entry_point('Lobster===1.9-a331c39-clean', 'console_scripts', 'lobster')() File "/cvmfs/cms.cern.ch/slc6_amd64_gcc630/external/py2-pippkgs/6.0-fmblme/lib/python2.7/site-packages/pkg_resources/init.py", line 567, in load_entry_point return get_distribution(dist).load_entry_point(group, name) File "/cvmfs/cms.cern.ch/slc6_amd64_gcc630/external/py2-pippkgs/6.0-fmblme/lib/python2.7/site-packages/pkg_resources/init.py", line 2604, in load_entry_point return ep.load() File "/cvmfs/cms.cern.ch/slc6_amd64_gcc630/external/py2-pippkgs/6.0-fmblme/lib/python2.7/site-packages/pkg_resources/init.py", line 2264, in load return self.resolve() File "/cvmfs/cms.cern.ch/slc6_amd64_gcc630/external/py2-pippkgs/6.0-fmblme/lib/python2.7/site-packages/pkg_resources/init.py", line 2270, in resolve module = import(self.module_name, fromlist=['name'], level=0) File "/afs/crc.nd.edu/user/g/gsmith15/.lobster/lib/python2.7/site-packages/lobster/ui.py", line 19, in from lobster.core import command, config File "/afs/crc.nd.edu/user/g/gsmith15/.lobster/lib/python2.7/site-packages/lobster/core/init.py", line 1, in from config import AdvancedOptions, Config File "/afs/crc.nd.edu/user/g/gsmith15/.lobster/lib/python2.7/site-packages/lobster/core/config.py", line 6, in from lobster.core.workflow import Category File "/afs/crc.nd.edu/user/g/gsmith15/.lobster/lib/python2.7/site-packages/lobster/core/workflow.py", line 10, in from lobster import fs, util File "/afs/crc.nd.edu/user/g/gsmith15/.lobster/lib/python2.7/site-packages/lobster/fs.py", line 6, in import se File "/afs/crc.nd.edu/user/g/gsmith15/.lobster/lib/python2.7/site-packages/lobster/se.py", line 6, in import snakebite.client File "/afs/crc.nd.edu/user/g/gsmith15/.lobster/lib/python2.7/site-packages/snakebite/client.py", line 20, in from snakebite.service import RpcService File "/afs/crc.nd.edu/user/g/gsmith15/.lobster/lib/python2.7/site-packages/snakebite/service.py", line 16, in from snakebite.channel import SocketRpcChannel File "/afs/crc.nd.edu/user/g/gsmith15/.lobster/lib/python2.7/site-packages/snakebite/channel.py", line 53, in import snakebite.protobuf.datatransfer_pb2 as datatransfer_proto File "/afs/crc.nd.edu/user/g/gsmith15/.lobster/lib/python2.7/site-packages/snakebite/protobuf/datatransfer_pb2.py", line 15, in RROR_CHECKSUM\x10\x02\x12\x11\n\rERROR_INVALID\x10\x03\x12\x10\n\x0c\x45RROR_EXISTS\x10\x04\x12\x16\n\x12\x45RROR_ACCESS_TOKEN\x10\x05\x12\x0f\n\x0b\x43HECKSUM_OK\x10\x06\x42>\n%org.apache.hadoop.hdfs.protocol.protoB\x12\x44\x61taTransferProtos\xa0\x01\x01') File "/afs/crc.nd.edu/user/g/gsmith15/.lobster/lib/python2.7/site-packages/google/protobuf/descriptor.py", line 829, in new return _message.default_pool.AddSerializedFile(serialized_pb) TypeError: Couldn't build proto file into descriptor pool! Invalid proto descriptor for file "datatransfer.proto": SUCCESS: "SUCCESS" is already defined in file "RpcPayloadHeader.proto". SUCCESS: Note that enum values use C++ scoping rules, meaning that enum values are siblings of their type, not children of it. Therefore, "SUCCESS" must be unique within the global scope, not just within "Status". ERROR: "ERROR" is already defined in file "RpcPayloadHeader.proto". ERROR: Note that enum values use C++ scoping rules, meaning that enum values are siblings of their type, not children of it. Therefore, "ERROR" must be unique within the global scope, not just within "Status".

klannon commented 6 years ago

This seems like an incompatibility with snakebite. I ran recently with CMSSW_9_3_0_pre2 and my somewhat old Lobster installation. I wonder if there wasn't an upgrade in one of the dependencies so that when you've pip installed (or the version supplied by a newer CMSSW) is inconsistent with the Lobster. Can you set yourself up (cmsenv and activate your virtualenv) and then do the following and compare to my output (see below)? Note: I think I've accumulated some "cruft" in my virtualenv so I might have some packages thtat you don't.

pip list --local

alabaster (0.7.10)
Babel (2.4.0)
dbs-client (3.4.4)
docutils (0.13.1)
elasticsearch (5.4.0)
elasticsearch-dsl (5.3.0)
httplib2 (0.10.3)
imagesize (0.7.1)
Lobster (1.6-efe02be-dirty)
lockfile (0.12.2)
pycurl-client (3.4.4)
python-daemon (2.1.2)
pyxdg (0.25)
retrying (1.3.3)
snakebite (1.3.13)
snowballstemmer (1.2.1)
Sphinx (1.6.3)
sphinx-rtd-theme (0.2.4)
sphinxcontrib-websupport (1.0.1)
subprocess32 (3.2.7)
typing (3.6.1)
wheel (0.29.0)
wmcore (1.1.1rc7)

pip list (everything, including coming from CMSSW)

alabaster (0.7.10)
appdirs (1.4.3)
Babel (2.4.0)
backports-abc (0.5)
backports.shutil-get-terminal-size (1.0.0)
backports.ssl-match-hostname (3.5.0.1)
bleach (2.0.0)
Bottleneck (1.2.1)
certifi (2017.4.17)
chardet (3.0.4)
click (6.7)
climate (0.4.6)
configparser (3.5.0)
cycler (0.10.0)
Cython (0.22)
dbs-client (3.4.4)
decorator (4.0.11)
deepdish (0.3.4)
docopt (0.6.2)
docutils (0.13.1)
downhill (0.4.0)
dxr (0.1)
elasticsearch (5.4.0)
elasticsearch-dsl (5.3.0)
entrypoints (0.2.3)
enum34 (1.1.6)
funcsigs (1.0.2)
functools32 (3.2.3.post2)
futures (3.1.1)
gfal2-util (1.4.0)
hep-ml (0.4.0)
histogrammar (1.0.8)
html5lib (0.999999999)
httplib2 (0.10.3)
hyperas (0.3)
hyperopt (0.1)
idna (2.5)
imagesize (0.7.1)
ipykernel (4.6.1)
ipython (5.3.0)
ipython-genutils (0.2.0)
ipywidgets (5.2.2)
Jinja (1.2)
Jinja2 (2.9.6)
jsonpickle (0.9.4)
jsonschema (2.6.0)
jupyter (1.0.0)
jupyter-client (5.0.1)
jupyter-console (5.1.0)
jupyter-core (4.3.0)
Keras (2.0.5)
llvmlite (0.18.0)
Lobster (1.6-efe02be-dirty)
lockfile (0.12.2)
Mako (1.0.6)
MarkupSafe (1.0)
matplotlib (1.5.2)
mistune (0.7.4)
mock (2.0.0)
mpmath (0.19)
nbconvert (5.2.1)
nbformat (4.3.0)
networkx (1.11)
nose (1.3.7)
notebook (4.3.1)
numba (0.33.0)
numexpr (2.6.2)
numpy (1.12.1)
ordereddict (1.1)
packaging (16.8)
pandas (0.20.2)
pandocfilters (1.4.1)
parsimonious (0.7.0)
pathlib2 (2.3.0)
pbr (3.0.1)
pexpect (4.2.1)
pickleshare (0.7.4)
pip (9.0.1)
pkgconfig (1.2.2)
prettytable (0.7.2)
professor (1.4.0)
professor2 (X.Y.Z)
prompt-toolkit (1.0.14)
protobuf (3.2.0)
prwlock (0.4.0)
psutil (5.2.2)
ptyprocess (0.5.1)
pycurl (7.43.0)
pycurl-client (3.4.4)
pydablooms (0.9.1)
Pygments (2.2.0)
pygpu (0.6.5)
pyparsing (2.2.0)
pysqlite (2.8.3)
pytest (3.1.3)
python-cjson (1.2.1)
python-daemon (2.1.2)
python-dateutil (2.6.0)
python-ldap (2.4.10)
pytz (2017.2)
pyxdg (0.25)
pyxrootd (4.6.0)
PyYAML (3.11)
pyzmq (16.0.2)
qtconsole (4.3.0)
rep (0.6.6)
repoze.lru (0.6)
requests (2.18.1)
retrying (1.3.3)
rivet (2.5.2)
root-numpy (4.7.2)
rootpy (0.9.1)
scandir (1.5)
schema (0.6.6)
scikit-learn (0.18.1)
scipy (0.19.0)
seaborn (0.7.1)
setuptools (28.3.0)
simplegeneric (0.8.1)
singledispatch (3.4.0.3)
six (1.10.0)
snakebite (1.3.13)
snowballstemmer (1.2.1)
Sphinx (1.6.3)
sphinx-rtd-theme (0.2.4)
sphinxcontrib-websupport (1.0.1)
SQLAlchemy (1.1.4)
subprocess32 (3.2.7)
sympy (1.0)
tables (3.4.2)
tensorflow (1.1.0)
terminado (0.6)
testpath (0.3.1)
theanets (0.7.3)
Theano (0.8.2)
tornado (4.4.2)
tqdm (4.14.0)
traitlets (4.3.2)
typing (3.6.1)
uncertainties (3.0.1)
urllib3 (1.21.1)
virtualenv (15.1.0)
wcwidth (0.1.7)
webencodings (0.5.1)
Werkzeug (0.12.2)
wheel (0.29.0)
widgetsnbextension (1.2.6)
wmcore (1.1.1rc7)
xgboost (0.6a2)
xrootdpyfs (0.1.4)
yoda (1.6.5)
geoff-smith commented 6 years ago

Hi Kevin,

I ran the commands you suggested, and attached a diff of the output of the "pip everything" command as part of this comment (you're the right arrow, I'm the left). There are quite a few differences, but I have the same version of snakebite as you (1.3.13). One thing that stood out to me was our versions of lobster:

you: Lobster (1.6-efe02be-dirty) me: Lobster (1.9-a331c39-clean)

Do you know what "dirty" vs. "clean" refers to? pipeverythingdiff.txt

klannon commented 6 years ago

Yeah. "Dirty" means that I've modified something locally in my release. Apparently, I haven't update Lobster as recently as I thought I had. Let me see if I can upgrade to your version of Lobster without changing anything else. If that breaks mine in the same way as yours, then we blame Matthias. If it doesn't break, then we have to start to look at some of the other differences. The protobuf one could be significant since the error message includes that. Can you try a simple test in CMSSW_9_3_0[_pre2] just to rule that out as a source of the error?

geoff-smith commented 6 years ago

After checking out CMSSW_9_3_0_pre2, cmsenv-ing, and then doing a fresh install of lobster I get the same error. FWIW, one thing I notice when running pip install --upgrade is lines like the following (using protobuf as an example since you mention it):

  Found existing installation: protobuf 3.2.0
    Not uninstalling protobuf at /cvmfs/cms.cern.ch/slc6_amd64_gcc630/external/py2-pippkgs_depscipy/3.0-ghjeda2/lib/python2.7/site-packages, outside environment /afs/crc.nd.edu/user/g/gsmith15/.lobster
klannon commented 6 years ago

That sounds like an issue. Why don't you try pip install --ignore-installed instead of -upgrade. That should install everything and not just things that don't already exist.

My current hypothesis is that when I installed Lobster, protobuf wasn't part of CMSSW. Now it is (thanks tensorflow ☹️).

klannon commented 6 years ago

By the way, if I recall, we're stuck on an older version of snakebite because CMS and OSG are stuck on an older version of Hadoop and snakebite isn't backwards compatible. @matz-e could verify that... Not sure if a newer version of snakebite would actually help though.

klannon commented 6 years ago

No, wait. Now that I've updated to the latest version of Lobster, I'm getting the same error.

klannon commented 6 years ago

So, something got broken in Lobster. I went back to my old commit (ac13024) and the error isn't there, even after doing a full pip install --upgrade which hits all the dependencies. So, I don't think it's just an accidental dependency upgrade. Initiating a binary search to find the last functioning version...

klannon commented 6 years ago

So, 50c2b65a3f0b8340059bb785940e8d91e8b1851e is broken but 4f6fc4bbe7ec929cf9cdb59f5bf63a1f98b991f0 is OK. At least for @geoff-smith and me.

klannon commented 6 years ago

@geoff-smith: Did you "Install from Package" or "Install from Source"? I'm installing from source. Same question for @Andrew42. The commit that breaks this was trying to address a problem installing from "website" which I assume means "package."

Andrew42 commented 6 years ago

When I installed lobster, I did it from source

klannon commented 6 years ago

@Andrew42: What version are you running right now?

Andrew42 commented 6 years ago

@klannon My local repo is on https://github.com/matz-e/lobster/commit/3d378be3af365f07a89207a2074b1846e9b2aadf

geoff-smith commented 6 years ago

@klannon I also installed from source.

klannon commented 6 years ago

@geoff-smith: I was able to get Lobster working again in the brach geoff-crash, which I've just pushed. Can you check out Lobster from that branch (don't forget to do the pip install command from the instructions) and try that to see if it fixes your problems?

klannon commented 6 years ago

@btovar @khurtado Can the VC3 builder fix problems like the ones we're having above? Short summary: right now we're installing Lobster on top of CMSSW because we need some (but maybe not all) of the CMSSW python packages. Some of the packages that Lobster needs conflict with CMSSW. So, if I understand correctly, we've been manipulating which python modules get loaded from which paths, and for different users, we see different behaviors (maybe?)

@matz-e and @annawoodard: if you're still following Lobster anymore, at one point tonight, I actually had Lobster built outside of cmsenv but the problem is that there are so many features of Lobster (like grabbing the sandbox from your release area or detecting the output file) that assume CMSSW is present that it's painful to try to run that way. I didn't have the patience to go through my test config and carefully remove all of the items that would trigger Lobster to try to load CMSSW-related modules. To get Lobster to work outside cmssw, I had to do the following:

pip install scipy
pip install --no-cache-dir --compile --ignore-installed --install-option="--with-nss" pycurl
btovar commented 6 years ago

It should be possible, yes.

On Tue, Feb 13, 2018 at 7:07 PM, Kevin Lannon notifications@github.com wrote:

@btovar https://github.com/btovar @khurtado https://github.com/khurtado Can the VC3 builder fix problems like the ones we're having above? Short summary: right now we're installing Lobster on top of CMSSW because we need some (but maybe not all) of the CMSSW python packages. Some of the packages that Lobster needs conflict with CMSSW. So, if I understand correctly, we've been manipulating which python modules get loaded from which paths, and for different users, we see different behaviors (maybe?)

@matz-e https://github.com/matz-e and @annawoodard https://github.com/annawoodard: if you're still following Lobster anymore, at one point tonight, I actually had Lobster built outside of cmsenv but the problem is that there are so many features of Lobster (like grabbing the sandbox from your release area or detecting the output file) that assume CMSSW is present that it's painful to try to run that way. I didn't have the patience to go through my test config and carefully remove all of the items that would trigger Lobster to try to load CMSSW-related modules. To get Lobster to work outside cmssw, I had to do the following:

pip install scipy pip install --no-cache-dir --compile --ignore-installed --install-option="--with-nss" pycurl

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/matz-e/lobster/issues/623#issuecomment-365449286, or mute the thread https://github.com/notifications/unsubscribe-auth/AC8GYsxrg8_GQW7ELwBt95wpntYN2a89ks5tUiPfgaJpZM4SDzP7 .

klannon commented 6 years ago

@Andrew42: Seeing as how I just reverted a change from @matz-e that was put in to avoid a crash that you were seeing, could you try out the geoff-crash branch and tell me what you observe?

matz-e commented 6 years ago

@klannon there were some plans to disentangle the output detection from relying on the right release. If we could Lobster to be standalone, that would be great!

matz-e commented 6 years ago

317 for reference…

matz-e commented 6 years ago

I have Lobster on my work schedule right now, so I'll see about that today. Should make our live a bit easier.

klannon commented 6 years ago

To give you the broader context, @btovar mentioned in last week's HEP computing meeting that he plans to put Lobster into the VC3-builder (https://github.com/vc3-project/vc3-builder) to make it easier to stand up Lobster within the VC3 infrastructure. If we can do that, I think we should make that our default way of distributing Lobster. I'm hoping then we can better deal with the dependencies. Decoupling Lobster from a CMSSW release would, I think, make this easier and more robust.

matz-e commented 6 years ago

Roger that. I have a documentation update in the pipeline that I worked on earlier, I'll shelve that and prioritize this.

Andrew42 commented 6 years ago

@klannon I moved to geoff-crash branch and did a pip install --upgrade . and I'm able to run the base lobster command without errors. I'll try actually running a lobster process.

Andrew42 commented 6 years ago

Hmm, OK this is strange when I tried to run lobster process makeEFTSelectionTree_lobster.py I get the following error:

Traceback (most recent call last):
  File "/afs/crc.nd.edu/user/a/awightma/.lobster/bin/lobster", line 6, in <module>
    from pkg_resources import load_entry_point
  File "/cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/py2-setuptools/2.1-ikhhed3/lib/python2.7/site-packages/pkg_resources.py", line 2697, in <module>
    working_set.require(__requires__)
  File "/cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/py2-setuptools/2.1-ikhhed3/lib/python2.7/site-packages/pkg_resources.py", line 669, in require
    needed = self.resolve(parse_requirements(requirements))
  File "/cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/py2-setuptools/2.1-ikhhed3/lib/python2.7/site-packages/pkg_resources.py", line 547, in resolve
    requirements = list(requirements)[::-1]  # set up the stack
  File "/cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/py2-setuptools/2.1-ikhhed3/lib/python2.7/site-packages/pkg_resources.py", line 2544, in parse_requirements
    line, p, specs = scan_list(VERSION,LINE_END,line,p,(1,2),"version spec")
  File "/cvmfs/cms.cern.ch/slc6_amd64_gcc530/external/py2-setuptools/2.1-ikhhed3/lib/python2.7/site-packages/pkg_resources.py", line 2512, in scan_list
    raise ValueError("Expected "+item_name+" in",line,"at",line[p:])
ValueError: ('Expected version spec in', 'Lobster===1.9-3dc59dd-clean', 'at', '===1.9-3dc59dd-clean')

Whats more, now when I try to just run the plain old lobster command with no process, I now get this same error.

geoff-smith commented 6 years ago

@klannon : I confirm that switching to the geoff-crash branch fixes the crash for me. I'll try to see if I can actually get some jobs running..

Andrew42 commented 6 years ago

I also managed to get lobster running with CMSSW 940. There were a couple of issues I ran into along the way, that I thought I would mention here.

  1. There appears to be major difference in the python thats used by 810 vs. 940, namely they were compiled using different unicode encodings. So, when you do the lobster install steps the CMSSW release that you used at the time will determine which CMSSW release you can use going forward for all lobster jobs.

  2. On CMSSW 940, when running the lobster command I got the same error that @klannon and @geoff-smith were getting. Furthermore, in order to fix this issue I had to remove the entire lobster virtualenv and re-install from the new branch. It wasn't enough to simply do a pip install command with the new branch.

  3. After this I was able to get a lobster process to run, but it would immediately fail due to missing this report.json file. To fix this I edited the setup.py file in my lobster source to include the following line core/data/report.json in the package_data object. I then did a pip install --upgrade and lobster was able to successfully run.

klannon commented 6 years ago

@Andrew42: Can you please push the fix for Item 3 above to this branch?

For items 1 and 2, these are known "features" of what we're doing, and I didn't do a good job of explaining that pretty much any time (like earlier this week) when I've had to debug a problem like this, I've ended up completely wiping out my virtualenv directory (and often also my .local) before I find a solution. Probably half the time I didn't really need to, but it's one of the first things I tend to do: wipe the slate clean and see what happens if I start fresh. I'm not surprised that you had to do it too.

For a slightly longer time scale, @khurtado is going to work with @btovar (for the record, I'd misremembered and it was @khurtado who was already planning to put Lobster into the VC3 builder tool). This won't be a magic fix, but it will force us to write down and understand what Lobster actually depends on, and should give us a more reproducible way to deploy Lobster in a variety of settings.

@Andrew42: please let me know when you push that commit. If it works, I'll have another student here test it to make sure that now a user starting fresh can make it work. If that's right, we should merge thi s fix into the master.

Andrew42 commented 6 years ago

@klannon, I pushed the fix and it can be found here https://github.com/matz-e/lobster/commit/4763d5302e857465bb0b0e9c0f38cbc71fc71d2c

klannon commented 6 years ago

Awesome!  Thanks!  No join the Lobster Slack team so I can loop you in on some stuff there.  I sent you an invite.