Closed rosskarchner closed 10 years ago
Pretty sweet, thank you! :heart_eyes_cat:
Hmmm, I'm getting the following error:
elasticsearch.exceptions.NotFoundError: TransportError(404, u'{"_index":"content","_type":"posts","_id":"spring-2014-rulemaking-agenda","found":false}')
In wordpress_view_procesor.py
I added the following code:
from sheer.processors.helpers import IndexHelper
index_helper = IndexHelper()
Then inside of the process_view
function...
...
popular_posts = []
for slug in custom_fields['popular_posts'][:5]:
popular_posts.append(index_helper.get_document('posts',slug))
post['popular_posts'] = popular_posts
...
are you sure you have the most up-to-date processors.json? (ie, is "posts" listed before "views')
Yup:
{
"posts" : {
"url" : "$WORDPRESS/?json=1",
"processor" : "wordpress_post_processor",
"mappings" : "_defaults/posts_mappings.json"
},
"views" : {
"url" : "$WORDPRESS/api/get_recent_posts/?post_type=view",
"processor" : "wordpress_view_processor"
}
}
Here's the full error:
$ sheer index
creating mapping for views (wordpress_view_processor)
No handlers could be found for logger "elasticsearch"
Traceback (most recent call last):
File "/Users/moricim/Projects/.virtualenvs/cfgov-refresh/bin/sheer", line 8, in <module>
execfile(__file__)
File "/Users/moricim/Projects/tools/sheer/sheer/scripts/sheer", line 60, in <module>
args.func(args, config)
File "/Users/moricim/Projects/tools/sheer/sheer/indexer.py", line 125, in index_location
for i, document in enumerate(processor.documents()):
File "/Users/moricim/Projects/himedlooff/cfw/cfgov-refresh/_lib/wordpress_view_processor.py", line 30, in documents
yield process_view(view)
File "/Users/moricim/Projects/himedlooff/cfw/cfgov-refresh/_lib/wordpress_view_processor.py", line 39, in process_view
popular_posts.append(index_helper.get_document('views','blog'))
File "/Users/moricim/Projects/tools/sheer/sheer/processors/helpers.py", line 21, in get_document
doc_type=doctype, id=docid)
File "/Users/moricim/Projects/.virtualenvs/cfgov-refresh/lib/python2.7/site-packages/elasticsearch/client/utils.py", line 70, in _wrapped
return func(*args, params=params, **kwargs)
File "/Users/moricim/Projects/.virtualenvs/cfgov-refresh/lib/python2.7/site-packages/elasticsearch/client/__init__.py", line 228, in get
params=params)
File "/Users/moricim/Projects/.virtualenvs/cfgov-refresh/lib/python2.7/site-packages/elasticsearch/transport.py", line 223, in perform_request
status, raw_data = connection.perform_request(method, url, params, body, ignore=ignore)
File "/Users/moricim/Projects/.virtualenvs/cfgov-refresh/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py", line 53, in perform_request
self._raise_error(response.status, raw_data)
File "/Users/moricim/Projects/.virtualenvs/cfgov-refresh/lib/python2.7/site-packages/elasticsearch/connection/base.py", line 82, in _raise_error
raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.NotFoundError: TransportError(404, u'{"_index":"content","_type":"views","_id":"blog","found":false}')
What are you trying to accomplish with:
popular_posts.append(index_helper.get_document('views','blog'))
That will always fail within the view processor, because the blog view hasn't been saved to elasticsearch yet
So now that 'posts' load before 'views', you can't do this from the posts processor:
But I'm not doing this:
popular_posts.append(index_helper.get_document('views','blog'))
.
I'm trying to get posts with get_document
from wordpress_view_procesor.py
like this: popular_posts.append(index_helper.get_document('posts',slug))
.
The traceback includes that line
File "/Users/moricim/Projects/himedlooff/cfw/cfgov-refresh/_lib/wordpress_view_processor.py", line 39, in process_view
popular_posts.append(index_helper.get_document('views','blog'))
Are you sure it's not there? ;)
OK, I see in the post from a few hours ago that it was failing to look up a post:
https://github.com/cfpb/sheer/pull/28#issuecomment-47722090
It almost seems like, when you run sheer index, it isn't respecting the new order defined in processors.json-- it's processing the view first, and thus get_document('posts','whatever') fails because there are no posts in elasticsearch yet.
Ok, my apologies. I did at one point try popular_posts.append(index_helper.get_document('views','blog'))
after popular_posts.append(index_helper.get_document('posts',slug))
didn't work. I figured trying a view in the view processor should work. So that explains the discrepancy, again sorry!
Here's the error when running popular_posts.append(index_helper.get_document('posts',slug))
:
$ sheer index
creating mapping for views (wordpress_view_processor)
No handlers could be found for logger "elasticsearch"
Traceback (most recent call last):
File "/Users/moricim/Projects/.virtualenvs/cfgov-refresh/bin/sheer", line 8, in <module>
execfile(__file__)
File "/Users/moricim/Projects/tools/sheer/sheer/scripts/sheer", line 60, in <module>
args.func(args, config)
File "/Users/moricim/Projects/tools/sheer/sheer/indexer.py", line 125, in index_location
for i, document in enumerate(processor.documents()):
File "/Users/moricim/Projects/himedlooff/cfw/cfgov-refresh/_lib/wordpress_view_processor.py", line 33, in documents
yield process_view(view)
File "/Users/moricim/Projects/himedlooff/cfw/cfgov-refresh/_lib/wordpress_view_processor.py", line 42, in process_view
popular_posts.append(index_helper.get_document('posts',slug))
File "/Users/moricim/Projects/tools/sheer/sheer/processors/helpers.py", line 21, in get_document
doc_type=doctype, id=docid)
File "/Users/moricim/Projects/.virtualenvs/cfgov-refresh/lib/python2.7/site-packages/elasticsearch/client/utils.py", line 70, in _wrapped
return func(*args, params=params, **kwargs)
File "/Users/moricim/Projects/.virtualenvs/cfgov-refresh/lib/python2.7/site-packages/elasticsearch/client/__init__.py", line 228, in get
params=params)
File "/Users/moricim/Projects/.virtualenvs/cfgov-refresh/lib/python2.7/site-packages/elasticsearch/transport.py", line 223, in perform_request
status, raw_data = connection.perform_request(method, url, params, body, ignore=ignore)
File "/Users/moricim/Projects/.virtualenvs/cfgov-refresh/lib/python2.7/site-packages/elasticsearch/connection/http_urllib3.py", line 53, in perform_request
self._raise_error(response.status, raw_data)
File "/Users/moricim/Projects/.virtualenvs/cfgov-refresh/lib/python2.7/site-packages/elasticsearch/connection/base.py", line 82, in _raise_error
raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.NotFoundError: TransportError(404, u'{"_index":"content","_type":"posts","_id":"spring-2014-rulemaking-agenda","found":false}')
Is the problematic version of cfgov-refresh commited anywhere? I'd like to checkout your code and see if I get the same problem
I'll create a fork for you
https://github.com/himedlooff/cfgov-refresh/tree/getdoc
Here's the commit with the code I added: https://github.com/himedlooff/cfgov-refresh/commit/d7d18431b9a34a7acc3a4b5bf9c2e0182252f771
Awesome, I'll give it a shot.
And there's a new IndexHelper which lets you get_document from processing code.
Use it like:
... later...