Closed zhangyongjiang closed 9 years ago
For those who have similar issue, I fixed this issue by adding dependencies below to the pom.xml file.
<dependency>
<groupId>org.apache.mahout.commons</groupId>
<artifactId>commons-cli</artifactId>
<version>2.0-mahout</version>
</dependency>
<dependency>
<groupId>org.apache.mahout</groupId>
<artifactId>mahout-math</artifactId>
<version>0.9</version>
</dependency>
<dependency>
<groupId>org.apache.solr</groupId>
<artifactId>solr-core</artifactId>
<version>4.4.0</version>
</dependency>
I think you are missing some jars from your classpath.
In any case this has been deprecated and moved into Mahout 1.0 as "spark-itemsimiarity" is has a much fuller CLI and does real cross-cooccurrnece. Runs way over 10x (maybe 50x) faster than this Hadoop version.
Thanks for your quick response. I will check Mahout 1.0 you mentioned.
I now have all the data generated under the out directory. Which files I should load into Solr?
Are you doing cross-cooccurrence?
If not look in item-links-docs/part-00000. That should be a csv with a header. Make sure the header says id,b_b_links
. The first field is an item id, the second is the list of space delimited items id "indicators" we used to call them links but terminology has changed.
The indicators in the part file can be directly indexed by Solr. The query is the user's history of actions as a space delimited string of item ids. The result will be an ordered list of items ids.
BTW the cross-cooccurrence allows you to use any number of user actions to add relevance to recommendations. You can use location, month, day-of-week, search terms, category preferences, even thumbs-down whatever action you can record a user taking, even on different item sets than the one you want to recommend. This greatly adds to usable data and accounts for the effects of location or time period.
Hi,
First of all thank you for writting this solr-recommender.
I'm trying to run the script but got class not found error as below. Could you please let me know how I can fix it?
Thanks a lot, Kevin
................ 15/01/22 14:30:35 INFO mapred.MapTask: Processing split: file:/Users/Zhang_Kevin/Documents/mine/big/projects/solr-recommender/tmp/tmp1/pairwiseSimilarity/part-r-00000:0+210 15/01/22 14:30:35 INFO mapred.MapTask: io.sort.mb = 100 15/01/22 14:30:35 INFO mapred.MapTask: data buffer = 79691776/99614720 15/01/22 14:30:35 INFO mapred.MapTask: record buffer = 262144/327680 15/01/22 14:30:35 INFO mapred.MapTask: Starting flush of map output 15/01/22 14:30:35 INFO mapred.LocalJobRunner: Map task executor complete. 15/01/22 14:30:35 WARN mapred.LocalJobRunner: job_local334070693_0007 java.lang.Exception: java.lang.NoClassDefFoundError: org/apache/lucene/util/PriorityQueue at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354) Caused by: java.lang.NoClassDefFoundError: org/apache/lucene/util/PriorityQueue at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:800) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$UnsymmetrifyMapper.map(RowSimilarityJob.java:520) at org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$UnsymmetrifyMapper.map(RowSimilarityJob.java:504) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassNotFoundException: org.apache.lucene.util.PriorityQueue at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 22 more 15/01/22 14:30:36 INFO mapred.JobClient: map 0% reduce 0% 15/01/22 14:30:36 INFO mapred.JobClient: Job complete: job_local334070693_0007 15/01/22 14:30:36 INFO mapred.JobClient: Counters: 0 15/01/22 14:30:36 INFO mapred.JobClient: Cleaning up the staging area file:/tmp/hadoop-Zhang_Kevin/mapred/staging/Zhang_Kevin1346036484/.staging/job_local1346036484_0008 Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/lucene/util/PriorityQueue at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:270) at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:810) at org.apache.hadoop.mapreduce.lib.input.MultipleInputs.getMapperTypeMap(MultipleInputs.java:141) at org.apache.hadoop.mapreduce.lib.input.DelegatingInputFormat.getSplits(DelegatingInputFormat.java:60) at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1054) at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1071) at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:983) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936) at org.apache.hadoop.mapreduce.Job.submit(Job.java:550) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:580) at org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.run(RecommenderJob.java:249) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at finderbots.recommenders.hadoop.RecommenderUpdateJob.run(RecommenderUpdateJob.java:129) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at finderbots.recommenders.hadoop.RecommenderUpdateJob.main(RecommenderUpdateJob.java:275) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:160) Caused by: java.lang.ClassNotFoundException: org.apache.lucene.util.PriorityQueue at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 26 more