Open roscopecoltran opened 7 years ago
@roscopecoltran this sounds like an awesome plan!
Having a Dockerfile would be great indeed.
Some other notes OTH:
about Python3: yes the plan is to port to Python... and this should happen pretty soon after the release of v2.0.0. Help is definitely welcomed. The plan was to eventually support only Python 3.6 and up and the question is whether or not to continue support Python 2.
something that is likely to help you a lot is the upcoming plugin architecture by @yashdsaraf
Hey,
Thanks for the reply !
It would be awesome to have a flow based processing for starting concurrent tasks on the code. defined in some human readable yaml files ^^.
Also, for any meta based additional info, about a project, I would recommend to use searx as it can flexible on the scope of metasearch to gather for public/web related info, if u want to build a context around the code audit.
Have a good week-end :-)
Cheers, Richard
btw the ticket for Python 3 is #295 and also #442
Bonjour Philippe, :-)
Hope you are all well !
I am bundling scancode into a docker alpine container but some errors occured.
I think that some tweaks are required while running the configure script. Mainly, it would be great to use fallbacks to find the library outside the scancode project, like operating a search in standard locations, and keeping the musl order, eg. paths = ['/lib', '/usr/local/lib', '/usr/lib'], or maybe to trigger a build event if the pre-compiled lib failed to be loaded.
In our case, I add such error message while trying to link the libmagic2.so shared library, the configure scripts fails with "__snprintf_chk: symbol not found"; probably due to musl-dev or glibc).
From my understanding, these functions are used to find and map the pre-compiled libmagic shared lib.
The easier solution would be to use an ubuntu based container, but the size of Ubuntu container is such a none-sense for me:
❯ docker images | awk '{print $1"\t"$2"\t"$7" "$8}'
REPOSITORY TAG SIZE
sample_alpine latest 167.6 MB
sample_ubuntu latest 447.8 MB
<none> <none> 187.9 MB
ubuntu latest 187.9 MB
redis latest 109.3 MB
mongo latest 261.6 MB
golang latest 709.5 MB
alpine latest 5.249 MB
Refs used:
Waiting for your input/point of view on that question. :-)
Cheers, Richard
@roscopecoltran this is effectively a todo: the prebuilt binaries need to be unbundled ... this is tracked in #469
Here's a simple Dockerfile
based on Alpine Linux 3.8 and Python 2.7.15:
FROM python:2.7.15-alpine
MAINTAINER Ernst de Haan "ernst.dehaan@mindcurv.com"
ARG SCANCODE_VERSION
RUN apk add build-base libxml2-dev libxslt-dev linux-headers
RUN pip install scancode-toolkit==${SCANCODE_VERSION}
CMD [ "/usr/local/bin/scancode" ]
Source can also be found here:
And the container here:
Here's how to build it with a version tag:
docker build -t scancode-toolkit:latest -t scancode-toolkit:2.2.1 --build-arg SCANCODE_VERSION=2.2.1 .
Hmm, while a simple scancode --help
works OK:
docker run -it mindcurv/scancode-toolkit:2.2.1 scancode --help
…a real scan doesn't work yet. I'm running into (apparently) the same issue as @roscopecoltran :
$ docker run -it mindcurv/scancode-toolkit:2.2.1 scancode --format json . scancode_result.json
Scanning files for: licenses, copyrights, packages with 1 process(es)...
Building license detection index...Traceback (most recent call last):
File "/usr/local/bin/scancode", line 11, in <module>
sys.exit(scancode())
File "/usr/local/lib/python2.7/site-packages/click/core.py", line 722, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python2.7/site-packages/scancode/utils.py", line 74, in main
standalone_mode=standalone_mode, **extra)
File "/usr/local/lib/python2.7/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python2.7/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python2.7/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/usr/local/lib/python2.7/site-packages/click/decorators.py", line 17, in new_func
return f(get_current_context(), *args, **kwargs)
File "/usr/local/lib/python2.7/site-packages/scancode/cli.py", line 490, in scancode
pre_scan_plugins=pre_scan_plugins)
File "/usr/local/lib/python2.7/site-packages/scancode/cli.py", line 572, in scan
get_index(False)
File "/usr/local/lib/python2.7/site-packages/licensedcode/cache.py", line 188, in get_index
_LICENSES_INDEX = get_or_build_index_through_cache()
File "/usr/local/lib/python2.7/site-packages/licensedcode/cache.py", line 108, in get_or_build_index_through_cache
from licensedcode.index import LicenseIndex
File "/usr/local/lib/python2.7/site-packages/licensedcode/index.py", line 47, in <module>
from licensedcode import match
File "/usr/local/lib/python2.7/site-packages/licensedcode/match.py", line 36, in <module>
from licensedcode import query
File "/usr/local/lib/python2.7/site-packages/licensedcode/query.py", line 32, in <module>
import typecode
File "/usr/local/lib/python2.7/site-packages/typecode/__init__.py", line 27, in <module>
from typecode.contenttype import get_type
File "/usr/local/lib/python2.7/site-packages/typecode/contenttype.py", line 47, in <module>
from typecode import magic2
File "/usr/local/lib/python2.7/site-packages/typecode/magic2.py", line 221, in <module>
libmagic = load_lib()
File "/usr/local/lib/python2.7/site-packages/typecode/magic2.py", line 214, in load_lib
lib = ctypes.CDLL(magic_so)
File "/usr/local/lib/python2.7/ctypes/__init__.py", line 366, in __init__
self._handle = _dlopen(self._name, mode)
OSError: Error relocating /usr/local/lib/python2.7/site-packages/typecode/bin/linux-64/lib/libmagic.so: __snprintf_chk: symbol not found
I will do some more research.
I've created a 2nd Dockerfile
, this time based on Debian Stretch. That seems to work.
Stuff is over here:
@znerd Thanks you for figuring all this out and sorry for some of the troubles: keep me posted as you progress!
At the moment, an Alpine container would require a bit of manual work as we there are some bundled, pre-built binaries that would need to be rebuilt by hand for this to work with an Alpine-style static build context.
Things should work fine with Debian/Ubuntu/CentOS/Fedora/Suse and similar
Here are two other examples: https://github.com/clearlydefined/tool-images/blob/0392771a408dbfb2ab5fcd88f702c43f207aa4ce/scancode/Dockerfile
https://github.com/clearlydefined/crawler/blob/1567d78abb7c1ee00c4ef89129ae9f1c56c92df4/Dockerfile
I am not sure which version of Scancode you used but I would suggest using the latest 2.9.x pre v3 releases.
I'm currently experimenting with a minimal Dockerfile
that already has glibc
installed:
FROM frolvlad/alpine-python2
RUN pip install scancode-toolkit
But building the image gives
Collecting extractcode-libarchive (from scancode-toolkit) ERROR: Could not find a version that satisfies the requirement extractcode-libarchive (from scancode-toolkit) (from versions: none)
ERROR: No matching distribution found for extractcode-libarchive (from scancode-toolkit)
@pombredanne, any idea what the issue is?
@sschuberth sorry for the late reply...
ScanCode has a dep on extractcode-libarchive
and this contains a prebuilt native that may not be happy with an Alpine static setup.
What is likely is that the glibc provided there may not match?
Could you try to use a release archive with this Alpine setup instead to eliminate some moving parts?
Just our if curiosity is using Alpine really proving such big benefits for the pain it brings?
What is likely is that the glibc provided there may not match?
What are the criteria by which glibc is matched?
Could you try to use a release archive with this Alpine setup instead to eliminate some moving parts?
Will try.
Just our if curiosity is using Alpine really proving such big benefits for the pain it brings?
Well, actually ScanCode is the first project I run into which resists to easily run on Alpine. But yes, image-size wise I believe it's worth the effort.
@sschuberth
What are the criteria by which glibc is matched?
I have no idea :D
that said, it (making things work on Alpine) could be something that could may be dealt with in the GSoC project of @aj4ayushjain ?
@pombredanne, so when installing from source while building the Alpine-Docker-image I get
Collecting extractcode-libarchive (from scancode-toolkit===3.0.2.post620.415d0c892->-r /scancode-toolkit/etc/conf/base.txt (line 10))
Could not find a version that satisfies the requirement extractcode-libarchive (from scancode-toolkit===3.0.2.post620.415d0c892->-r /scancode-toolkit/etc/conf/base.txt (line 10)) (from versions: )
No matching distribution found for extractcode-libarchive (from scancode-toolkit===3.0.2.post620.415d0c892->-r /scancode-toolkit/etc/conf/base.txt (line 10))
* Installing components ...
Failed to execute command:
pip install --upgrade --no-index --no-cache-dir --find-links="/scancode-toolkit/thirdparty" -r "/scancode-toolkit/etc/conf/base.txt". Aborting...
Not really much more telling about what exactly is the issue why extractcode-libarchive
could not be installed.
@sschuberth It does not have the binaries of extractcode-libarchive plugin for alpine that's what it mean so @pombredanne can't we build a prebuilt native binary for alpine and fix this.And if there is enough time available i will do it in alpine because it's whole new world so i need to study on this.
Ok, this Dockerfile
gets me a bit further:
FROM frolvlad/alpine-python2
RUN apk add --no-cache py-icu
# Override PIP's glibc detection, see https://github.com/pypa/pip/issues/3969.
RUN echo "manylinux1_compatible = True" > /usr/lib/python2.7/_manylinux.py
RUN pip install --prefer-binary scancode-toolkit
But now it still wants to build intbitset
using gcc
:
Running setup.py install for intbitset: started
Running setup.py install for intbitset: finished with status 'error'
ERROR: Complete output from command /usr/bin/python -u -c 'import setuptools, tokenize;__file__='"'"'/tmp/pip-install-4aegy3/intbitset/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-rymTcl/install-record.txt --single-version-externally-managed --compile:
ERROR: running install
running build
running build_py
creating build
creating build/lib.linux-x86_64-2.7
copying intbitset/intbitset_helper.py -> build/lib.linux-x86_64-2.7
copying intbitset/version.py -> build/lib.linux-x86_64-2.7
running egg_info
writing requirements to intbitset/intbitset.egg-info/requires.txt
writing intbitset/intbitset.egg-info/PKG-INFO
writing top-level names to intbitset/intbitset.egg-info/top_level.txt
writing dependency_links to intbitset/intbitset.egg-info/dependency_links.txt
reading manifest file 'intbitset/intbitset.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching '*.css' under directory 'docs/_themes'
warning: no files found matching '*.css_t' under directory 'docs/_themes'
warning: no files found matching '*.conf' under directory 'docs/_themes'
warning: no files found matching '*.html' under directory 'docs/_themes'
warning: no files found matching 'COPYING' under directory 'docs/_themes'
warning: no files found matching 'README' under directory 'docs/_themes'
warning: no files found matching '*.html' under directory 'docs/_templates'
writing manifest file 'intbitset/intbitset.egg-info/SOURCES.txt'
running build_ext
building 'intbitset' extension
creating build/temp.linux-x86_64-2.7
creating build/temp.linux-x86_64-2.7/intbitset
gcc -fno-strict-aliasing -Os -fomit-frame-pointer -g -DNDEBUG -Os -fomit-frame-pointer -g -DTHREAD_STACK_SIZE=0x100000 -fPIC -I/usr/include/python2.7 -c intbitset/intbitset.c -o build/temp.linux-x86_64-2.7/intbitset/intbitset.o -O3 -march=core2 -mtune=native
unable to execute 'gcc': No such file or directory
error: command 'gcc' failed with exit status 1
----------------------------------------
ERROR: Command "/usr/bin/python -u -c 'import setuptools, tokenize;__file__='"'"'/tmp/pip-install-4aegy3/intbitset/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-rymTcl/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-install-4aegy3/intbitset/
WARNING: You are using pip version 19.1, however version 19.1.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
can't we build a prebuilt native binary for alpine
@aj4ayushjain, that basically what I had been asking for in https://github.com/nexB/scancode-toolkit/issues/1262, but it turns out to be quite laborious, and as mentioned here I by now believe the better approach is to use an Alpine Docker image that has https://github.com/sgerrand/alpine-pkg-glibc installed, which is what I'm trying right now.
can't we build a prebuilt native binary for alpine
@aj4ayushjain, that basically what I had been asking for in #1262, but it turns out to be quite laborious, and as mentioned here I by now believe the better approach is to use an Alpine Docker image that has https://github.com/sgerrand/alpine-pkg-glibc installed, which is what I'm trying right now.
I tried installing alpine-pkg-glibc and it didn't work for me were you able to get this to work? .. still getting this error
Traceback (most recent call last):
File "/usr/bin/scancode", line 7, in
@earlyster sorry for that! @sschuberth if you have an Alpine Dockerfile taht works, we can make this part of the repo here alright.
@earlyster also the work that @aj4ayushjain is doing on packaging in general (and debian in particular) will likely help here too as we would possibly have a clearer way to get things from Alpine packages.
Thanks @pombredanne for the update!
IIRC the last version of my Alpine-based Dockerfile has the same __snprintf_chk: symbol not found
error despite glibc being installed. I currently have no time to look further into this.
@sschuberth ok so we are in the same boat .. not able to use scantool with alpine based docker image.
Trying to run on alpine but it's not working. Any updates on this?
I finally managed to create an Alpine-based Docker image with Python 3.6 that is able to run ScanCode. Feel free to give it a try: https://github.com/sschuberth/docker-files/blob/4ebd681dfef5a8142b92c1157edd4f5495f0706b/scancode/Dockerfile
@sschuberth you rock! Do you know if the base image is something that is reliable In https://github.com/sschuberth/docker-files/blob/4ebd681dfef5a8142b92c1157edd4f5495f0706b/scancode/Dockerfile#L2 As a first gut reaction in:
# See https://github.com/sgerrand/alpine-pkg-glibc/issues/111#issuecomment-466301535.
FROM frolvlad/alpine-miniconda3:python3.6
... frolvlad
makes me cringe a bit to use a trusted base image.
That depends on your definition of "trusted" :wink: I was also a bit reluctant at first as it's "just some random single user" (and not e.g. a company / foundation) maintaining these images. But @frolvlad seems to be very active in the Docker / Alpine community and quite a few people seem to be using his images. So I decided for myself to trust these base images.
It feels a tad engaged to me if I dig a little:
Just curious, how big a startup speed gain and size gain do you get with this?
I haven't bothered to measure this so far TBH...
FWIW, there is a new support for Alpine musl in the Python wheel manylinux including support on PyPI. This could pave the way to support Alpine images.
Oh, wow, I am quite amused to find these types of topics and articles throughout the internet using my Alpine-baked images here and there (frolvlad
is my handle on Docker Hub). Well, I would not trust a random guy on the internet to provide a base image for some mission-critical software, but you are free to just copy-paste the contents of the Dockerfile to your image based on plain alpine image if you wish 🤷
Hi guys,
Hope you are all well !
I found your great repo while I was working on a personal project aiming to do, also, some code analysis with a distributed bot, working as a virtual agent to build a semi-structured database of meta informations about some open source projects (mainly SQL and graph based).
Context: The goal is to enrich automatically, with a virtual assistant/bot, some of my starred repos by 'left joining', or apply some graphql queries, on some data like, known frameworks detection, github stats or github trends, in order to extract more meta data from some of my starred repositories.
In a nutshell, it is all about to have a more convenient local search engines on my starred repos, create a domain specific topic modeling api. By detecting some patterns, I want to manage a dynamic tree of events that could be dedicated to some sub-tasks like to generate Docker/Docker-Compose files from a database a snippets if some dependencies are detected or matched.
examples of pattern detection to build dynamic dockerfiles:
1. Deploy local instances with Docker/Docker Compose
For some tasks, it sounds clear that some content lexer, grep, parser would provide faster responses and results; dependencies scanning, licences; so that s how I found your projects, and that really cool :-)
So first of all, as a dev ops ^^, I was wondering if it would not be easier to bundle scancode into some docker/docker-compose files, of course alpine based in order to keep the size of containers reasonable.
2. Some suggestions/features
So far, I started to give a try to your projects, and it sounds that I will create a fork with some personal ideas/features, close to want the context mentioned above, and wanted to share it with you, so any feedback or experience sharing is tremendously welcomed :-)
- Administration panel
- Topic modelling
- Code parsing/search
-Starred Github managers:
3. Questions related to your roadmap:
Last question, do you plan to migrate to Python3 your stack of scripts ? Is it something that you have on the roadmap and would like to have a community efforts ?
Ps. sorry for the long post, but I was really inspired by your work ^^, so thanks for reading it all :-)
Please have a great day !
Cheers, Richard