Open jameshilliard opened 4 years ago
I just realized the install/distribution size is a major issue, without this change the tar.gz compressed autobahn distribution package is roughly 1MB, uncompressed it is a massive 11.2MB(which is very large for resource constrained embedded systems like the one I use autobahn with). It seems the bundled ETH contracts are the main cause of this huge size increase as they are 8.8MB alone.
With #1369 autobahn is about 300KB when tar.gz compressed and 2.2MB uncompressed.
IMO this alone is a good enough reason to split the xbr modules out.
Packages should not install modules that are not relevant to the typical use case for the package without strong technical justification as this can significantly increase attack surface,
there are many aspects here .. one would be: in your app, you only need a strict subset of the code in autobahn. why not remove all code in autobahn that you don't need in your app?
you could even do that in the build step for your app, and it the app will have the minimal (internal) attack surface, as it only code that is strictly needed for the app is there anyways
Over time the demarcation point between the xbr modules and autobahn may become less clear which increases the security audit complexity
yes, we will evolve this together, as quite simply: XBR builds on top of WAMP which (can) build on top of WebSocket. All 3 are working as an integrated stack. hence we want close integration by design.
for example, we deliberately did not split into 2 packages for WebSocket and WAMP
if you don't need WAMP, but only WebSocket, you can have a custom build step downstream that removes all files you don't need
it is well known in the cryptocurrency community that the Ethereum project has a very poor track record in regards to security in general
is it? pls provide support for this claim (or its 2 parts). further: if you think so, don't use autobahn or any of my code then;) I am a big Ethereum fan, and it is a perfect complement to WAMP, as it takes the decentralization/decoupling to the multi-party level .. just my view. Security wise, there surely had been issues, and there is learning required how to write secure contracts etc. expected, as this is breaking new land ...
anyways: I agree to your main point: any code that is not required and not there cannot originate a security issue. so the goal is to minimize the code that does "the job" (=app). why not then go full circle and remove all code that your app doesn't need?
It seems the bundled ETH contracts are the main cause of this huge size increase as they are 8.8MB alone.
ok, yes, I can see how this sucks (if you don't need it).
mmh. thinking ... this is sth that we might be able to fix for itself without doing the big namespace thing:
we could split out the contracts bundle (the whole 8.8MB) into a separate (python) package (basically only containing the JSON files), but keep everything else untouched (besides then adding the new dependency on the abi package)
would that work for you? this would be much less intrusive and fits the current approach in autobahn (as with websocket and wamp, not split up. that is xbr py code stays in the one package, only the JSON files are in a 2nd package)
so an alternative to the full PEP420 approach would split out only the JSON files into a new xbr
package. this would solve the size issue.
security wise, my argument would be like this:
pip install autobahn
without the xbr
flavor will not install any of the "bad ethereum dependencies" (other packages)autobahn.xbr
and confined to this subdir https://github.com/crossbario/autobahn-python/tree/master/autobahn/xbrbtw, rgd the argument: for security, installed code should be absolutely minimized.
this will collide for example with users that required us to follow the (granted, best) practice of also including the tests in the package, eg https://github.com/crossbario/autobahn-python/tree/master/autobahn/twisted/test => is packaged up!
I think it was Debian which have a policy for tests to be included in a package.
now what? include? not include? =)
I am just trying to make a point here: if you wanna optimize for max security, then why not remove all code an app doesn't need. in fact, why not remove all drivers and OS parts not needed as well: http://rumpkernel.org/
there are many aspects here .. one would be: in your app, you only need a strict subset of the code in autobahn. why not remove all code in autobahn that you don't need in your app?
To a large degree I actually do this, I use buildroot which generates a heavily stripped down operating system image with only minimal dependencies, by doing that I can get my entire production OS image(bootloader+kernel+rootfs) including embedded web browser down to ~100MB compressed.
yes, we will evolve this together, as quite simply: XBR builds on top of WAMP which (can) build on top of WebSocket. All 3 are working as an integrated stack. hence we want close integration by design.
Yeah, that's why I took this approach since it should still allow for that development model, ie by cross pinning xbr and autobahn versions in setup.py so that you then don't have to worry about internal API breakage between the two.
if you don't need WAMP, but only WebSocket, you can have a custom build step downstream that removes all files you don't need
I use the WAMP functionality extensively.
further: if you think so, don't use autobahn or any of my code then;)
Heh, I haven't found any security flaws so far in autobahn, seeing a bunch of ethereum libraries kinda scares me though as I've seen the type of code written by Vitalik, I actually rewrote the transaction parsing code for a wallet project(joinmarket) that had unfortunately used vitalik buterin's pybitcointools which had regexes in the transaction parser, that was not an easy rewrite.
Security wise, there surely had been issues, and there is learning required how to write secure contracts etc. expected, as this is breaking new land ...
I'm quite skeptical writing a secure contract in with eth is possible for anything other than the most trivial one, especially when the company founded by the former ETH foundation CTO couldn't even write a multisig wallet contract that didn't get hacked, not once but twice.
so an alternative to the full PEP420 approach would split out only the JSON files into a new xbr package. this would solve the size issue.
At the start it would, but I assume the xbr module will grow significantly over time. If you're already splitting out the JSON files I don't really see why it would be much more difficult to also split the python parts as both packages would still need to be installed for xbr to function.
now what? include? not include? =)
Personally I would be on the not include side, but I mostly work with buildroot for OS development which makes an effort to remove tests from application builds.
I am just trying to make a point here: if you wanna optimize for max security, then why not remove all code an app doesn't need. in fact, why not remove all drivers and OS parts not needed as well: http://rumpkernel.org/
Heh, of course one must weigh the maintenance burden as well, I find buildroot generally does a pretty good job in regards to making good tradeoffs when it comes to that sort of thing. Part of the reason size is so critical for me is that to update my app/apps I essentially re-image the entire system every time, this way updates are effectively atomic and I don't have to worry about applications on the device getting incompatible versions, this sort of update method is only feasible if you strip everything down, I can even send the ~100mb compressed update images over 3g connections since they are small enough(which many of the devices I work with use for internet connectivity).
thanks a lot for your comments! interesting. also important for me to understand where you come from / how you look at this. because we need to strike a balance .. as the "include vs not include" example demonstrates, with different user groups come different priorities.
we have to strike a balance between the different priorities (our own as well .. and the resources we have).
of course one must weigh the maintenance burden as well,
yes, absolutely. applies to us as well;)
rgd the "size issue" specifically, pls have a look at this:
(cpy382_1) oberstet@intel-nuci7:~/scm/crossbario/autobahn-python$ du -hs autobahn
11M autobahn
(cpy382_1) oberstet@intel-nuci7:~/scm/crossbario/autobahn-python$ rm -rf autobahn/xbr/contracts
(cpy382_1) oberstet@intel-nuci7:~/scm/crossbario/autobahn-python$ du -hs autobahn
2,4M autobahn
(cpy382_1) oberstet@intel-nuci7:~/scm/crossbario/autobahn-python$ rm -rf autobahn/wamp/gen
(cpy382_1) oberstet@intel-nuci7:~/scm/crossbario/autobahn-python$ du -hs autobahn
2,0M autobahn
(cpy382_1) oberstet@intel-nuci7:~/scm/crossbario/autobahn-python$ rm -rf autobahn/nvx
(cpy382_1) oberstet@intel-nuci7:~/scm/crossbario/autobahn-python$ du -hs autobahn
1,9M autobahn
(cpy382_1) oberstet@intel-nuci7:~/scm/crossbario/autobahn-python$ find autobahn -type d -name test*
autobahn/test
autobahn/twisted/test
autobahn/twisted/testing
autobahn/asyncio/test
autobahn/websocket/test
autobahn/rawsocket/test
autobahn/wamp/test
autobahn/xbr/test
(cpy382_1) oberstet@intel-nuci7:~/scm/crossbario/autobahn-python$ find autobahn -type d -name test* -exec rm -rf {} \;
(cpy382_1) oberstet@intel-nuci7:~/scm/crossbario/autobahn-python$ find autobahn -type d -name test*
(cpy382_1) oberstet@intel-nuci7:~/scm/crossbario/autobahn-python$ du -hs autobahn
1,5M autobahn
(cpy382_1) oberstet@intel-nuci7:~/scm/crossbario/autobahn-python$ rm -rf autobahn/xbr/
(cpy382_1) oberstet@intel-nuci7:~/scm/crossbario/autobahn-python$ du -hs autobahn
1,3M autobahn
(cpy382_1) oberstet@intel-nuci7:~/scm/crossbario/autobahn-python$
so:
summary:
at this point of the discussion, I would be +1 on splitting out the ABI files into a separate package only. lemme know if you are cool with that.
if so, next question is: should we have to do the baking and publishing of that new package in this repo anyways?
maybe just publish the ABI file (only) to a py package directly from https://github.com/crossbario/xbr-protocol
after that, it'll just be a matter of changing a few import lines here to point to the new package ..
splitting out the whole XBR thing would additionally save only 200K
Well this is only at the moment, my assumption is that you will be significantly expanding the features for this over time resulting it causing significant size increases down the line.
at this point of the discussion, I would be +1 on splitting out the ABI files into a separate package only. lemme know if you are cool with that.
if so, next question is: should we have to do the baking and publishing of that new package in this repo anyways?
Well I'd prefer if the python files were also split out, maybe I'm missing something but I don't really see how splitting out the python files would be more difficult than just splitting out the json files, you effectively end up with having to maintain 2 separate pypi packages either way. If both projects are in the same development tree as in #1369 then you avoid the maintainence burden of having to keep 2 separate repos in sync for development.
The idea is that this way the development process for autobahn/xbr doesn't have to really change at all, just the pypi packaging process is changed effectively.
maybe I'm missing something but I don't really see how splitting out the python files would be more difficult than just splitting out the json files
we have a quite complex CI/CD pipeline (also include private repos), and I just think it will have less fallout onto this. it will require us to touch half a dozen repos in any case.
anways: we won't do the namespace thing. we would do the abi file thing - if you want. otherwise I would close this, as we don't care about size.
There's a setuptools example on how to exclude tests by the way. From my understanding it's best practices for python packages to have tests be part of the git tree but not the pypi packages.
we have a quite complex CI/CD pipeline (also include private repos), and I just think it will have less fallout onto this. it will require us to touch half a dozen repos in any case.
Ok, so if I'm understanding this correctly the issue is that they are building directly from master on this repository using pip and it would introduce extra complexity there during development due to having to install 2 packages?
If that's the case I think I have another option which would work better, it would be to just create 2 autobahn pypi packages, the origional autobahn
without xbr
and a separate autobahn-xbr
or something like that which includes all the xbr
stuff, with the xbr
variant being the default one that gets built when doing pip install from the git repo, that way there should be no changes to the development process. This would be just be a change in the build process essentially.
From my understanding it's best practices for python packages to have tests be part of the git tree but not the pypi packages.
maybe? I wouldn't care that much. I also actually don't care about Debian policies - which definitely state the opposite!
my prios are simple: make it practically work great and secure for users - while making progress on lots of other things on my plate;)
OT: I had a look at
mmmmh, granted, this looks a bit scary and obscure (non-pythonic) ..
If that's the case I think I have another option which would work better, it would be to just create 2 autobahn pypi packages ...
no, sorry, that's not what we want. you can fork the whole repo and remove anything you don't need
anyways, sorry, I have to work on other stuff now;)
no, sorry, that's not what we want. you can fork the whole repo and remove anything you don't need
I wasn't suggesting to create 2 repos, just to tweak the build process so that we generate 2 different pypi packages from the same codebase.
I wasn't suggesting to create 2 repos, just to tweak the build process so that we generate 2 different pypi packages from the same codebase.
the only change I would support is:
this solves the size issue practically and addresses a real and valid user concern (that you raised) in my eyes.
it will result in some (limited) work on our side, but that's the balance with user concerns we're hoping to strike;)
maybe the actual issue you are trying to address (just a question) is a security hardening requirement:
"Our shipped product must only contain the minimum, stripped, strictly-required code"?
if so, then IMO this needs to be solved systematically during product image baking in a respective "remove/strip files" step.
trying to solve that on the library level is the wrong place ..
split out the JSON files into a separate package created/published from xbr-protocol
It doesn't really help a whole lot by itself as it wouldn't really simplify the buildroot packaging process for autobahn at all. I'm mostly looking for an upstream solution to avoid having to patch autobahn so heavily in buildroot as stripping lines like that is fairly brittle.
1 repo and 1 package that includes the xbr code (as today, no change), but
split out the JSON files into a separate package created/published from xbr-protocol
One thing I can think of that may simplify the packaging while still staying within those parameters would be to add an environmental variable or something like that which can be passed to setup.py to tell it to exclude xbr modules from installation, that way the pypi repo would still contain xbr modules but there would be an option to strip the modules at install time essentially.
if so, then IMO this needs to be solved systematically during product image baking in a respective "remove/strip files" step.
The typical approach for having buildroot remove/strip files is to try and upstream flags/options to the respective upstream project that allow for excluding features/files, these flags would then be passed at build time to the package to tell it to not install the features/files, the reason this is the preferred approach is because removing/stripping files from applications is usually a process that is very specific to the application being stripped and is difficult to generalize, see my buildroot patch as en example for what it looks like when you don't have these flags/options.
It doesn't really help a whole lot by itself as it wouldn't really simplify the buildroot packaging process for autobahn at all.
I don't understand how that is, because if we split out the JSON files, adding only autobahn
(and not the then new xbr
) would reduce the size by 8MB.
size problem solved.
what other problem are you refering to?
I think I found a non-intrusive buildroot compatible way to strip xbr
modules in #1374 using an environmental variable. This technique could also be trivially modified to generate multiple flavors of autobahn pypi packages if desired.
what other problem are you refering to?
Currently to strip the xbr modules entirely one needs to hack up the setup.py file and purge directories as I did in my buildroot patch, this is a bit of a maintenance problem on the buildroot side.
I think I found a non-intrusive buildroot compatible way ..
yeah, awesome! I merged that. it indeed addresses all concerns=) happy we finally found one (well, you;) and thanks for your time+work!
we should also document what AUTOBAHN_STRIP_XBR does (and is promising to continue to do):
autobahn.xbr
moduleseg setting the env var while doing pip install autobahn[xbr]
will install the xbr dependencies ...
eg setting the env var while doing pip install autobahn[xbr] will install the xbr dependencies ...
Think it's a good idea to have AUTOBAHN_STRIP_XBR disable xbr dependencies entirely when set since it's not really a sensible configuration to install autobahn[xbr]
with a stripped autobahn
?
I wish there was a way to programmatically detect which flavor of autobahn was requested, I poked through setuptools/pip code a bit looking for one but couldn't figure out a way.
@oberstet OT: If you're interested in chatting more about crypto use cases for autobahn I have a number of ideas/possible synergies with what I'm working on, I'm usually around in the #autobahn
irc channel under the username Lightsword
.
I wish there was a way to programmatically detect which flavor of autobahn was requested ..
yeah, if that would be available, we should just raise an exception at install time when a user had set AUTOBAHN_STRIP_XBR but also provided the xbr
or all
install flavors. that would be best.
if it is not possible to get the install flavor at run-time (reflect the value in use) - which I'm also not aware of - then the deps will be there, but not the modules. so xbr won't be effectively usable, but still added attack surface due to installed deps. well, that's life;)
The current behavior for autobahn is to install a bunch of
xbr
modules even when the user does not request them to be installed.There are a number of reason this is a bad idea:
Packages should not install modules that are not relevant to the typical use case for the package without strong technical justification as this can significantly increase attack surface, especially for a package like
autobahn
which already has significant network attack surface(in one of my projects it's effectively the only package doing any remote network communications over public networks other than the usual DNS/DHCP/VPN client daemons).Over time the demarcation point between the
xbr
modules andautobahn
may become less clear which increases the security audit complexity, already this seems to be starting to happen as there arexbr
modules in multiple different autobahn paths. By splitting the modules into a separate package it becomes very obvious that they are not relevant to the typical use case and thus can be ignored when auditing security.In regards to the security of
xbr
specifically, it is well known in the cryptocurrency community that the Ethereum project has a very poor track record in regards to security in general, in fact I have personally discovered security vulnerabilities in the design of some ETH network protocols in the past. Being able to fully remove potentially risky Ethereum project related code makes it easier to audit the security of autobahn.The maintenance of the conditional loading of imports is more complex than using conditional installation of the
xbr
modules.The approach I took in #1369 to splitting out the
xbr
package from autobahn has a number of advantages:It cleanly separates out the
xbr
feature from the rest of autobahn so that one doesn't need to audit the security of thexbr
side to ensure it can't accidentally cause runtime side effects when using baseautobahn
functionality.It still allows for
xbr
andautobahn
to be developed in a tightly coupled way, for example by always having both packages require exact versions of each other in the setup.py(you can just script the package version bump to always bump and uploadautobahn
andxbr
at the same time with matching versions). They will still share and install to the sameautobahn
site-packages namespace as before and can be installed in the same way by the end user, the main difference is that the development tree and distribution packages have cleanly separated functionality and thatautobahn
can be installed withoutxbr
.It allows for simplification of the
xbr
parts of the codebase as one can entirely remove all the conditional dependency based import logic currently used throughout thexbr
components asxbr
would only ever be installed when all necessary dependencies are also installed.