Open jsf9k opened 6 years ago
This makes perfect sense. Though:
This goes against the intent of the splitting of requirements.txt in #224.
Splitting the requirements that way was meant (for me, anyway) to make packaging Lambda functions lighter by eschewing local-only reqs, but not necessarily to make local dependencies lighter by eschewing Lambda-only reqs.
I don't think there's as much of a need to do the latter, since there usually aren't the same kinds of constraints on network/size/etc locally as in Lambda. But maybe I'm not thinking about broader use cases. Would it help your use case to split this out further?
I don't think we need to split the requirements files out further. Ideally I'd like it if you only needed to install requirements.txt
and the scanners you want to use locally. Right now that doesn't work because of the imports at the top of the files in scanners/
, but I suspect we could get there if we moved some imports around in the scanner classes that @tadhg-ohiggins is working on (like scanners/trustymail.py
is doing, although for a different reason).
Since my scanning kicks off the Lambda processes from Docker it's nice if that container doesn't have to actually install the actual scanning libraries. It's not a huge deal, but it does make the Docker images smaller. And it removes what are really unnecessary dependencies.
That makes sense, and is something I wasn't intuiting because I'm not currently using Docker in my environment.
The ideal is probably to have a standard way of conditionally importing dependencies.
One complication, though - for the pshtt
scanner, pshtt
is needed even locally anyway, because it uses pshtt.load_suffix_list()
in the init
method to manage caching the PSL locally. You're currently handling all of that in scan
, using a locally packaged PSL, and don't bother with that optimization.
It seems very possible that other scanners may make use of third party deps in their init()
or init_domain()
or to_rows()
functions, not just their scan()
functions. That may make it more complicated/annoying to do that kind of separation.
In #224 we separated
requirements.txt
into several differentrequirements.txt
files, which I think is a good thing. In addition to one minor bug that I fixed in commit cc141091188b060a144a4dd9307262a96d06daf7 in #234, there is a larger issue with the scanners.In
pshtt.py
and insslyze.py
we importpshtt
andsslyze
, respectively, at the top of the file. This means that when running./scan
even in Lambda one must havepshtt
andtrustymail
installed locally, on the host where you are running./scan
. This goes against the intent of the splitting ofrequirements.txt
in #224.Note that
trustymail.py
does not have this problem because we only importtrustymail
in thescan()
function. Hence it is only imported inside the Lambda function and not by the host where./scan
is being run.I'm not sure it's worth fixing this now, since there is an obvious workaround, but I wanted to make sure this gets taken into account in the work @tadhg-ohiggins is doing in #232. If the scanner classes can work in such a way that the dependencies specific to that particular scanner only get imported inside the Lambda function (when running in Lambda) then we can keep the local dependencies to a minimum.
I hope all this makes sense. Please ask me for clarification if it's not. :smiley: