eliasgranderubio / dagda

a tool to perform static analysis of known vulnerabilities, trojans, viruses, malware & other malicious threats in docker images/containers and to monitor the docker daemon and running docker containers for detecting anomalous activities
Apache License 2.0
1.16k stars 163 forks source link

vuln first sync error #19

Closed xinity closed 7 years ago

xinity commented 7 years ago

Hello,

Trying to do my first sync of the Vuln DB. it fails with this message:

dagda     |  <2017-04-28 12:38:07,069> <ERROR> <DagdaServer> <dagda_server> <Unexpected exception of type JoblibConnectionError occured: ('Multiprocessing exception:
...........................................................................
/opt/app/dagda.py in <module>()
    130         if r is not None:
    131             print(json.dumps(json.loads(r.content.decode(\'utf-8\')), sort_keys=True, indent=4))
    132
    133 
    134 if __name__ == "__main__":
--> 135     main(DagdaCLIParser())
    136 
    137 
    138 
    139 

...........................................................................
/opt/app/dagda.py in main(parsed_args=<cli.command.start_cli_parser.StartCLIParser object>)
     58                          mongodb_port=parsed_args.get_mongodb_port(),
     59                          mongodb_ssl=parsed_args.is_mongodb_ssl_enabled(),
     60                          mongodb_user=parsed_args.get_mongodb_user(),
     61                          mongodb_pass=parsed_args.get_mongodb_pass(),
     62                          falco_rules_filename=parsed_args.get_falco_rules_filename())
---> 63         ds.run()
        ds.run = <bound method DagdaServer.run of <api.dagda_server.DagdaServer object>>
     64 
     65     else:
     66         dagda_base_url = get_dagda_base_url()
     67         # -- Executes vuln sub-command

...........................................................................
/opt/app/api/dagda_server.py in run(self=<api.dagda_server.DagdaServer object>)
     69         if edn_pid == 0:
     70             try:
     71                 while True:
     72                     item = InternalServer.get_dagda_edn().get()
     73                     if item[\'msg\'] == \'init_db\':
---> 74                         self._init_or_update_db()
        self._init_or_update_db = <function DagdaServer._init_or_update_db>
     75                     elif item[\'msg\'] == \'check_image\':
     76                         self._check_docker_by_image_name(item)
     77                     elif item[\'msg\'] == \'check_container\':
     78                         self._check_docker_by_container_id(item)

...........................................................................
/opt/app/api/dagda_server.py in _init_or_update_db()
    121         try:
    122             InternalServer.get_mongodb_driver().insert_init_db_process_status(
    123                 {\'status\': \'Initializing\', \'timestamp\': datetime.datetime.now().timestamp()})
    124             # Init db
    125             db_composer = DBComposer()
--> 126             db_composer.compose_vuln_db()
        db_composer.compose_vuln_db = <bound method DBComposer.compose_vuln_db of <vulnDB.db_composer.DBComposer object>>
    127             InternalServer.get_mongodb_driver().insert_init_db_process_status(
    128                 {\'status\': \'Updated\', \'timestamp\': datetime.datetime.now().timestamp()})
    129         except Exception as ex:
    130             message = "Unexpected exception of type {0} occured: {1!r}".format(type(ex).__name__,  ex.args)

...........................................................................
/opt/app/vulnDB/db_composer.py in compose_vuln_db(self=<vulnDB.db_composer.DBComposer object>)
     92                 self.mongoDbDriver.bulk_insert_bids(bid_items_list)
     93                 bid_items_list.clear()
     94             # Set the new max bid
     95             max_bid = 94417
     96         # Updating BugTraqs from http://www.securityfocus.com/
---> 97         bid_items_array = get_bug_traqs_lists_from_online_mode(bid_downloader(first_bid=max_bid+1, last_bid=97200))
        bid_items_array = [[], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], ...]
        max_bid = 94417
     98         for bid_items_list in bid_items_array:
     99             self.mongoDbDriver.bulk_insert_bids(bid_items_list)
    100             bid_items_list.clear()
    101 

...........................................................................
/opt/app/vulnDB/bid_downloader.py in bid_downloader(first_bid=94418, last_bid=97200)
     72             return json.dumps(prepare_output(title, bugtraq_id, vuln_products), sort_keys=True)
     73 
     74 
     75 # Executes the main function called get_bid in a parallel way
     76 def bid_downloader(first_bid, last_bid):
---> 77     output_list = Parallel(n_jobs=100)(delayed(get_bid)(i) for i in range(first_bid, last_bid + 1))
        output_list = undefined
        first_bid = 94418
        last_bid = 97200
     78     return [x for x in output_list if x is not None]
     79 
     80 
     81 

...........................................................................
/usr/local/lib/python3.4/site-packages/joblib/parallel.py in __call__(self=Parallel(n_jobs=100), iterable=<generator object <genexpr>>)
    763             if pre_dispatch == "all" or n_jobs == 1:
    764                 # The iterable was consumed all at once by the above for loop.
    765                 # No need to wait for async callbacks to trigger to
    766                 # consumption.
    767                 self._iterating = False
--> 768             self.retrieve()
        self.retrieve = <bound method Parallel.retrieve of Parallel(n_jobs=100)>
    769             # Make sure that we get a last message telling us we are done
    770             elapsed_time = time.time() - self._start_time
    771             self._print(\'Done %3i out of %3i | elapsed: %s finished\',
    772                         (len(self._output), len(self._output),

---------------------------------------------------------------------------
Sub-process traceback:
---------------------------------------------------------------------------
ConnectionError                                    Fri Apr 28 12:38:06 2017
PID: 43                                 Python 3.4.5: /usr/local/bin/python\n...........................................................................
/usr/local/lib/python3.4/site-packages/joblib/parallel.py in __call__(self=<joblib.parallel.BatchedCalls object>)
    126     def __init__(self, iterator_slice):
    127         self.items = list(iterator_slice)
    128         self._size = len(self.items)
    129
    130     def __call__(self):
--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]
        self.items = [(<function get_bid>, (94999,), {})]
    132
    133     def __len__(self):
    134         return self._size
    135

...........................................................................
/usr/local/lib/python3.4/site-packages/joblib/parallel.py in <listcomp>(.0=<list_iterator object>)
    126     def __init__(self, iterator_slice):
    127         self.items = list(iterator_slice)
    128         self._size = len(self.items)
    129
    130     def __call__(self):
--> 131         return [func(*args, **kwargs) for func, args, kwargs in self.items]\n        func = <function get_bid>
        args = (94999,)
        kwargs = {}
    132 
    133     def __len__(self):
    134         return self._size
    135 

...........................................................................
/opt/app/vulnDB/bid_downloader.py in get_bid(bugtraq_id=94999)
     57     return data
     58 
     59
     60 # Requests the bid, parses the HTML and prints the BugTraq info
     61 def get_bid(bugtraq_id):
---> 62     r = requests.get("http://www.securityfocus.com/bid/" + str(bugtraq_id))
        r = undefined\n        bugtraq_id = 94999
     63     if r.status_code == 200:
     64         try:
     65             body = r.content.decode("utf-8")
     66             body = body[body.index(\'<div id="vulnerability">\'):body.index(\'<span class="label">Not Vulnerable:</span>\')]

...........................................................................
/usr/local/lib/python3.4/site-packages/requests/api.py in get(url=\'http://www.securityfocus.com/bid/94999\', params=None, **kwargs={\'allow_redirects\': True})
     65     :return: :class:`Response <Response>` object
     66     :rtype: requests.Response
     67     """
     68 
     69     kwargs.setdefault(\'allow_redirects\', True)
---> 70     return request(\'get\', url, params=params, **kwargs)
        url = \'http://www.securityfocus.com/bid/94999\'
        params = None\n        kwargs = {\'allow_redirects\': True}
     71 
     72 
     73 def options(url, **kwargs):
     74     """Sends a OPTIONS request.

...........................................................................
/usr/local/lib/python3.4/site-packages/requests/api.py in request(method=\'get\', url=\'http://www.securityfocus.com/bid/94999\', **kwargs={\'allow_redirects\': True, \'params\': None})
     51 
     52     # By using the \'with\' statement we are sure the session is closed, thus we
     53     # avoid leaving sockets open which can trigger a ResourceWarning in some
     54     # cases, and look like a memory leak in others.
     55     with sessions.Session() as session:
---> 56         return session.request(method=method, url=url, **kwargs)
        session.request = <bound method Session.request of <requests.sessions.Session object>>
        method = \'get\'
        url = \'http://www.securityfocus.com/bid/94999\'
        kwargs = {\'allow_redirects\': True, \'params\': None}
     57 
     58 
     59 def get(url, params=None, **kwargs):
     60     """Sends a GET request.

...........................................................................
/usr/local/lib/python3.4/site-packages/requests/sessions.py in request(self=<requests.sessions.Session object>, method=\'get\', url=\'http://www.securityfocus.com/bid/94999\', params=None, data=None, headers=None, cookies=None, files=None, auth=None, timeout=None, allow_redirects=True, proxies={}, hooks=None, stream=None, verify=None, cert=None, json=None)
    470         send_kwargs = {
    471             \'timeout\': timeout,
    472             \'allow_redirects\': allow_redirects,
    473         }
    474         send_kwargs.update(settings)
--> 475         resp = self.send(prep, **send_kwargs)
        resp = undefined
        self.send = <bound method Session.send of <requests.sessions.Session object>>
        prep = <PreparedRequest [GET]>
        send_kwargs = {\'allow_redirects\': True, \'cert\': None, \'proxies\': OrderedDict(), \'stream\': False, \'timeout\': None, \'verify\': True}
    476
    477         return resp
    478
    479     def get(self, url, **kwargs):

...........................................................................
/usr/local/lib/python3.4/site-packages/requests/sessions.py in send(self=<requests.sessions.Session object>, request=<PreparedRequest [GET]>, **kwargs={\'cert\': None, \'proxies\': OrderedDict(), \'stream\': False, \'timeout\': None, \'verify\': True})
    591 
    592         # Start time (approximately) of the request
    593         start = datetime.utcnow()
    594 
    595         # Send the request
--> 596         r = adapter.send(request, **kwargs)
        r = undefined
        adapter.send = <bound method HTTPAdapter.send of <requests.adapters.HTTPAdapter object>>
        request = <PreparedRequest [GET]>
        kwargs = {\'cert\': None, \'proxies\': OrderedDict(), \'stream\': False, \'timeout\': None, \'verify\': True}
    597 
    598         # Total elapsed time of the request (approximately)
    599         r.elapsed = datetime.utcnow() - start
    600 

...........................................................................
/usr/local/lib/python3.4/site-packages/requests/adapters.py in send(self=<requests.adapters.HTTPAdapter object>, request=<PreparedRequest [GET]>, stream=False, timeout=<requests.packages.urllib3.util.timeout.Timeout object>, verify=True, cert=None, proxies=OrderedDict())
    468                     # Then, reraise so that we can handle the actual exception.
    469                     low_conn.close()
    470                     raise
    471 
    472         except (ProtocolError, socket.error) as err:
--> 473             raise ConnectionError(err, request=request)
        err = undefined
        request = <PreparedRequest [GET]>
    474 
    475         except MaxRetryError as e:
    476             if isinstance(e.reason, ConnectTimeoutError):
    477                 # TODO: Remove this in 3.0.0: see #2811

ConnectionError: (\'Connection aborted.\', ConnectionResetError(104, \'Connection reset by peer\'))',)>

What can i do to fix this ?

Thanks for your help,

eliasgranderubio commented 7 years ago

Normally, when a BugTraq Id doesn't exist, the Security Focus web page return the 404 HTTP status code. However, your exception looks that Security Focus web page didn't work fine in that moment.

Anyway, I have added a try-catch for logging this error if it occurs again.

Please, try to initialize your database and give me feedback if the error persists.

xinity commented 7 years ago

sounds better, still initialization is taking too much time :(

it lacks an ending message, so we know initialization is complete

eliasgranderubio commented 7 years ago

I agree with you. At the moment, I'm trying fixing the initialization time and improving the information about BugTraqs in the reports as I have done with the CVEs and the Exploits.

If you want to know if initialization is completed, you have the next endpoint for checking it.

xinity commented 7 years ago

At the moment, I'm trying fixing the initialization time and improving the information about BugTraqs in the reports as I have done with the CVEs and the Exploits.

Been working on dagda this week-end, init fails many times, perhaps because you scrap bugtracks and CVE database.

Perhaps a better solution would be to generate a mongodb pre initialize that would reduce sync time for users. I can help on this topic around Docker integration if you like :)

eliasgranderubio commented 7 years ago

I'm still working for improving the data model and fixing the issues when the bugTracks are scrapped from Security Focus web pages.

I think that your idea could be fine but I have some doubts about:

Anyway, only the first init take several minutes for populating the MongoDB because Dagda must to create the whole database. The next runs of the init method, the population is incremental so it should not take more than one or two minutes.

Has the initmethod failed yet after the commit that I did 6 days ago?

xinity commented 7 years ago

Has the initmethod failed yet after the commit that I did 6 days ago?

It did because securityfocus wasn't apparently unavailable

eliasgranderubio commented 7 years ago

I've just modified the data model for including the BugTraq ID details. That is way, I have another github repository as my personal PoC about Security Focus web scrapping:

Anyway, I had not any problem with the web scrapping and I uploaded the result in the previous github repo.

Please, remove your current Dagda data model in your MongoDB, upgrade to the last version and try again. I hope my last changes fixing your issue.

xinity commented 7 years ago

will do as advised and let you know ASAP :)

eliasgranderubio commented 7 years ago

Could you give me some feedback about this issue? Else I suppose the issue was fixed with the last comment that I wrote.