dirtyfilthy / freshonions-torscraper

Fresh Onions is an open source TOR spider / hidden service onion crawler hosted at zlal32teyptf4tvi.onion
GNU Affero General Public License v3.0
505 stars 148 forks source link

TransportError 400 when trying to insert into Elasticsearch 5.6.10 #24

Open 0vert1m3 opened 6 years ago

0vert1m3 commented 6 years ago
    for x in result:
  File "/root/freshonions-torscraper/torscraper/middlewares.py", line 192, in <genexpr>
    return (_set_range(r) for r in result or ())
  File "/usr/local/lib/python2.7/dist-packages/scrapy/spidermiddlewares/referer.py", line 339, in <genexpr>
    return (_set_referer(r) for r in result or ())
  File "/usr/local/lib/python2.7/dist-packages/scrapy/spidermiddlewares/urllength.py", line 37, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/usr/local/lib/python2.7/dist-packages/scrapy/spidermiddlewares/depth.py", line 58, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/usr/local/lib/python2.7/dist-packages/pony/orm/core.py", line 533, in new_gen_func
    output = wrapped_interact(iterator)
  File "/usr/local/lib/python2.7/dist-packages/pony/orm/core.py", line 520, in wrapped_interact
    rollback_and_reraise(sys.exc_info())
  File "/usr/local/lib/python2.7/dist-packages/pony/orm/core.py", line 320, in rollback_and_reraise
    reraise(*exc_info)
  File "/usr/local/lib/python2.7/dist-packages/pony/orm/core.py", line 510, in wrapped_interact
    output = interact(iterator, input, exc_info)
  File "/usr/local/lib/python2.7/dist-packages/pony/orm/core.py", line 484, in interact
    return next(iterator) if input is None else iterator.send(input)
  File "/root/freshonions-torscraper/torscraper/spiders/tor_scrapy.py", line 379, in parse
    pg.save()
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch_dsl/document.py", line 429, in save
    **doc_meta
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/utils.py", line 73, in _wrapped
    return func(*args, params=params, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/__init__.py", line 300, in index
    _make_path(index, doc_type, id), params=params, body=body)
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/transport.py", line 312, in perform_request
    status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/connection/http_urllib3.py", line 129, in perform_request
    self._raise_error(response.status, raw_data)
  File "/usr/local/lib/python2.7/dist-packages/elasticsearch/connection/base.py", line 125, in _raise_error
    raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
RequestError: TransportError(400, u'illegal_argument_exception', u"can't specify parent if no parent field has been configured")
L3houx commented 6 years ago

Hi! I'm not sure of the cause of this problem, but I think that it could be linked to the initialization of Elasticsearch or linked to your Elasticsearch's version. It could also help, this is our forked project: https://github.com/GoSecure/freshonions-torscraper. I updated the installation documentation and if will all of these hints you don't find a solution, just let me know I will try to investigate a little bit more.

jeffwcollins commented 6 years ago

I was having that same issue, and when I try to run the elasticsearch_migrate script, it returns: "ImportError: No module named elasticsearch_dsl.connections" Though it is located in the venv lib folder, so I don't know if its a change in the program or what. Eventually, I had to install elasticsearch_dsl from "apt-get", then run the pip install -r requirements.txt from outside of the virtualenv which downgraded the *dsl version, then once again run it from within the virtualenv for all of the services to properly run. Seemed a bit convoluted, but running this on ubuntu18.04, I figured that there were some OS and thirdparty fixes that still need to happen in order for all of the applications to work.

L3houx commented 6 years ago

I faced up problems like this few times when I worked on this project. The implementation, the version, and all the structure were pretty capricious. I know that the old version 5.6.6 works pretty well, but if you are able to make works with the version 5.6.10, just let me know it interests me.

Thank's @jeffwcollins !