Closed jnleec closed 9 years ago
I just tried that same example from the blog with the same version of HTTP Collector (2.0.2) and I also tried with the latest snapshot and both worked just fine with the code and configuration taken as is (with valid API keys specified). The blog example and the code that goes with it is using the version 2.2 of the Facebook Graph API (was the latest at the time). Can you try with that version of the Graph API? If that also fails, can you share your config (stripping your secret values).
If you need to use the 2.4 version of the API and it does not work for you, you may have to update the code given in the blog accordingly (or contact Norconex professional services for help). The JSON format returned may have changed.
Thanks for your response. I have tried v2.2 of the api, however, the same with v2.4, it returned only three field when I run the example code with "https://graph.facebook.com/v2.2/disney/posts" as start url.
Here is my config file, thank you!
<?xml version="1.0" encoding="UTF-8"?>
The code that comes with the blog is not meant to be an all-purpose Facebook crawler without modifications. You have to adapt it to your needs. The reason you get the error you have is because the FacebookDocumentSplitter
class is expecting the "message" field to be retrieved. If you add it to your list of fields, it run without errors when I try it.
Keep in mind if you add Facebook fields not expected in the sample code, it won't do anything about them and they will be ignored. Also, document.reference
is a field added by the collector, and is not a Facebook field. Using the Facebook Graph API Explorer, you can find out what are all the Facebook fields.
Thank you very much!
hi, how many documents you can get when using the Facebook crawler sample code? I can only get a little, however, I have check the "next" url, and it works well.
INFO [AbstractCrawler] Facebook Posts: Maximum documents reached: 10 INFO [AbstractCrawler] Facebook Posts: Maximum documents reached: 10 INFO [AbstractCrawler] Facebook Posts: Deleting orphan references (if any)... INFO [AbstractCrawler] Facebook Posts: Deleted 0 orphan URLs... INFO [AbstractCrawler] Facebook Posts: Crawler finishing: committing documents. INFO [AbstractCrawler] Facebook Posts: 52 reference(s) processed. INFO [CrawlerEventManager] CRAWLER_FINISHED (Subject: com.norconex.collector.http.crawler.HttpCrawler@7296c1fc) INFO [AbstractCrawler] Facebook Posts: Crawler completed. INFO [AbstractCrawler] Facebook Posts: Crawler executed in 4 seconds.
I see the following limitations in your config:
<maxDepth>2</maxDepth>
<maxDocuments>10</maxDocuments>
Taking those off (or setting them to -1) should give you more.
wow, thank you!
I use example(http://www.norconex.com/how-to-crawl-facebook/) to crawl facebook, however, i get this error, i use the norconex-collector-http 2.0.2 and the start url is "https://graph.facebook.com/v2.4/disney/posts?fields=from,picture,type,link,created_time,description", if there is no fields, we only get message,created_time and id, the api version is v2.4.