datasalt / pangool

Tuple MapReduce for Hadoop: Hadoop API made easy
http://datasalt.github.io/pangool/
Apache License 2.0
57 stars 13 forks source link

TestSolrOutputFormat test can't run succussfully #23

Closed dongpf closed 11 years ago

dongpf commented 11 years ago

Hi, @pereferrera solr data index files were output in /src/test/resources/solr-es/tmp/hadoop-$user/mapred/local/solr_attempt_local_0001_m_000000_0.1/ folder, not in expected out-com.datasalt.pangool.solr.TestSolrOutputFormat/part-00000. This issue caused all assertTrue(new File(OUTPUT + "/part-00000/data/index").exists()); asserts fail.

pereferrera commented 11 years ago

Thanks dong, let me take a look to this.

dongpf commented 11 years ago

My dev environment is windows7 + cygwin + eclipse. Do I have to configure solr?

Thanks!

pereferrera commented 11 years ago

You don't need to have solr to actually pass the test... I guess the issue could be related to cygwin, but let me confirm.

pereferrera commented 11 years ago

Dong, if you comment out this line of the test (last one):

// trash(OUTPUT);

Do you see the folder : "out-com.datasalt.pangool.solr.TestSolrOutputFormat" in core/ ? What is it inside this folder?

Otherwise, can you paste the full log of the test?

dongpf commented 11 years ago

Pere, The index data folder: pangool_test1

The out-com.datasalt.pangool.solr.TestSolrOutputFormat folder: pangool_test2

And the full log of the test is:

13/01/22 17:49:42 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 13/01/22 17:49:42 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 13/01/22 17:49:42 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). 13/01/22 17:49:42 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 13/01/22 17:49:42 INFO input.FileInputFormat: Total input paths to process : 1 13/01/22 17:49:43 INFO mapred.JobClient: Running job: job_local_0001 13/01/22 17:49:43 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 13/01/22 17:49:43 INFO input.FileInputFormat: Total input paths to process : 1 13/01/22 17:49:43 INFO mapred.MapTask: io.sort.mb = 100 13/01/22 17:49:43 INFO mapred.MapTask: data buffer = 79691776/99614720 13/01/22 17:49:43 INFO mapred.MapTask: record buffer = 262144/327680 13/01/22 17:49:43 INFO input.DelegatingMapper: [profile] Got input split. Going to look at DC. 13/01/22 17:49:43 INFO input.DelegatingMapper: [profile] Finished. Calling run() on delegate. 13/01/22 17:49:43 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 13/01/22 17:49:43 WARN solr.SolrRecordWriter: logger class:org.apache.commons.logging.impl.Log4JLogger 13/01/22 17:49:43 WARN solr.SolrRecordWriter: logger class:org.apache.commons.logging.impl.Log4JLogger 13/01/22 17:49:43 WARN solr.SolrRecordWriter: logger class:org.apache.commons.logging.impl.Log4JLogger 13/01/22 17:49:43 INFO solr.HeartBeater: Heart beat reporting class is org.apache.hadoop.mapreduce.TaskAttemptContext 13/01/22 17:49:43 INFO solr.HeartBeater: HeartBeat thread running 13/01/22 17:49:43 INFO solr.HeartBeater: Issuing heart beat for 1 threads 13/01/22 17:49:43 INFO solr.SolrRecordWriter: SolrHome: /D:/workspace/pangool/core/src/test/resources/solr-es 13/01/22 17:49:43 INFO solr.SolrRecordWriter: Constructed instance information solr.home D:/workspace/pangool/core/src/test/resources/solr-es (/D:/workspace/pangool/core/src/test/resources/solr-es), instance dir D:/workspace/pangool/core/src/test/resources/solr-es\, conf dir D:/workspace/pangool/core/src/test/resources/solr-es\conf/, writing index to temporary directory \tmp\hadoop-feiqiong.dpf\mapred\local\solr_attempt_local_0001_m_000000_0.1\data, with permdir file:/D:/workspace/pangool/core/out-com.datasalt.pangool.solr.TestSolrOutputFormat/_temporary/_attempt_local_0001_m_000000_0/ES/part-00000 13/01/22 17:49:43 WARN core.SolrConfig: and configuration sections are deprecated and will fail for luceneMatchVersion=LUCENE_40 and later. Please use instead. 13/01/22 17:49:43 WARN schema.IndexSchema: no uniqueKey specified in schema. 13/01/22 17:49:43 WARN core.SolrCore: New index directory detected: old=null new=D:/workspace/pangool/core/src/test/resources/solr-es\tmp\hadoop-feiqiong.dpf\mapred\local\solr_attempt_local_0001_m_000000_0.1\data\index/ 13/01/22 17:49:44 WARN handler.UpdateRequestHandler: Using deprecated class: XmlUpdateRequestHandler -- replace with UpdateRequestHandler 13/01/22 17:49:44 WARN handler.UpdateRequestHandler: Using deprecated class: BinaryUpdateRequestHandler -- replace with UpdateRequestHandler 13/01/22 17:49:44 INFO mapred.JobClient: map 0% reduce 0% 13/01/22 17:49:44 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 13/01/22 17:49:44 WARN solr.SolrRecordWriter: logger class:org.apache.commons.logging.impl.Log4JLogger 13/01/22 17:49:44 WARN solr.SolrRecordWriter: logger class:org.apache.commons.logging.impl.Log4JLogger 13/01/22 17:49:44 WARN solr.SolrRecordWriter: logger class:org.apache.commons.logging.impl.Log4JLogger 13/01/22 17:49:44 INFO solr.HeartBeater: Heart beat reporting class is org.apache.hadoop.mapreduce.TaskAttemptContext 13/01/22 17:49:44 INFO solr.SolrRecordWriter: SolrHome: /D:/workspace/pangool/core/src/test/resources/solr-fr 13/01/22 17:49:44 INFO solr.HeartBeater: HeartBeat thread running 13/01/22 17:49:44 INFO solr.HeartBeater: Issuing heart beat for 1 threads 13/01/22 17:49:44 INFO solr.SolrRecordWriter: Constructed instance information solr.home D:/workspace/pangool/core/src/test/resources/solr-fr (/D:/workspace/pangool/core/src/test/resources/solr-fr), instance dir D:/workspace/pangool/core/src/test/resources/solr-fr\, conf dir D:/workspace/pangool/core/src/test/resources/solr-fr\conf/, writing index to temporary directory \tmp\hadoop-feiqiong.dpf\mapred\local\solr_attempt_local_0001_m_000000_0.2\data, with permdir file:/D:/workspace/pangool/core/out-com.datasalt.pangool.solr.TestSolrOutputFormat/_temporary/_attempt_local_0001_m_000000_0/FR/part-00000 13/01/22 17:49:44 WARN core.SolrConfig: and configuration sections are deprecated and will fail for luceneMatchVersion=LUCENE_40 and later. Please use instead. 13/01/22 17:49:44 WARN schema.IndexSchema: no uniqueKey specified in schema. 13/01/22 17:49:44 WARN core.SolrCore: New index directory detected: old=null new=D:/workspace/pangool/core/src/test/resources/solr-fr\tmp\hadoop-feiqiong.dpf\mapred\local\solr_attempt_local_0001_m_000000_0.2\data\index/ 13/01/22 17:49:44 WARN handler.UpdateRequestHandler: Using deprecated class: XmlUpdateRequestHandler -- replace with UpdateRequestHandler 13/01/22 17:49:44 WARN handler.UpdateRequestHandler: Using deprecated class: BinaryUpdateRequestHandler -- replace with UpdateRequestHandler 13/01/22 17:49:44 INFO solr.BatchWriter: Waiting for 0 items and 0 threads to finish executing 13/01/22 17:49:44 INFO solr.BatchWriter: Waiting for 0 items and 0 threads to finish executing 13/01/22 17:49:45 INFO mapred.MapTask: Starting flush of map output 13/01/22 17:49:45 INFO mapred.MapTask: Finished spill 0 13/01/22 17:49:45 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting 13/01/22 17:49:45 INFO mapred.LocalJobRunner: 13/01/22 17:49:45 INFO mapred.TaskRunner: Task attempt_local_0001_m_000000_0 is allowed to commit now 13/01/22 17:49:45 INFO output.FileOutputCommitter: Saved output of task 'attempt_local_0001_m_000000_0' to out-com.datasalt.pangool.solr.TestSolrOutputFormat 13/01/22 17:49:45 INFO mapred.LocalJobRunner: 13/01/22 17:49:45 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0' done. 13/01/22 17:49:45 INFO mapred.LocalJobRunner: 13/01/22 17:49:45 INFO mapred.Merger: Merging 1 sorted segments 13/01/22 17:49:45 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 92 bytes 13/01/22 17:49:45 INFO mapred.LocalJobRunner: 13/01/22 17:49:45 WARN solr.SolrRecordWriter: logger class:org.apache.commons.logging.impl.Log4JLogger 13/01/22 17:49:45 WARN solr.SolrRecordWriter: logger class:org.apache.commons.logging.impl.Log4JLogger 13/01/22 17:49:45 WARN solr.SolrRecordWriter: logger class:org.apache.commons.logging.impl.Log4JLogger 13/01/22 17:49:45 INFO solr.HeartBeater: Heart beat reporting class is org.apache.hadoop.mapreduce.TaskAttemptContext 13/01/22 17:49:45 INFO solr.HeartBeater: HeartBeat thread running 13/01/22 17:49:45 INFO solr.HeartBeater: Issuing heart beat for 1 threads 13/01/22 17:49:45 INFO solr.SolrRecordWriter: SolrHome: /D:/workspace/pangool/core/src/test/resources/solr-en 13/01/22 17:49:45 INFO solr.SolrRecordWriter: Constructed instance information solr.home D:/workspace/pangool/core/src/test/resources/solr-en (/D:/workspace/pangool/core/src/test/resources/solr-en), instance dir D:/workspace/pangool/core/src/test/resources/solr-en\, conf dir D:/workspace/pangool/core/src/test/resources/solr-en\conf/, writing index to temporary directory \tmp\hadoop-feiqiong.dpf\mapred\local\solr_attempt_local_0001_r_000000_0.3\data, with permdir out-com.datasalt.pangool.solr.TestSolrOutputFormat/part-00000 13/01/22 17:49:45 WARN core.SolrConfig: and configuration sections are deprecated and will fail for luceneMatchVersion=LUCENE_40 and later. Please use instead. 13/01/22 17:49:45 INFO mapred.JobClient: map 100% reduce 0% 13/01/22 17:49:45 ERROR core.CoreContainer: CoreContainer was not shutdown prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!! instance=16311605 13/01/22 17:49:45 WARN schema.IndexSchema: no uniqueKey specified in schema. 13/01/22 17:49:45 WARN core.SolrCore: New index directory detected: old=null new=D:/workspace/pangool/core/src/test/resources/solr-en\tmp\hadoop-feiqiong.dpf\mapred\local\solr_attempt_local_0001_r_000000_0.3\data\index/ 13/01/22 17:49:45 WARN handler.UpdateRequestHandler: Using deprecated class: XmlUpdateRequestHandler -- replace with UpdateRequestHandler 13/01/22 17:49:45 WARN handler.UpdateRequestHandler: Using deprecated class: BinaryUpdateRequestHandler -- replace with UpdateRequestHandler 13/01/22 17:49:45 INFO solr.BatchWriter: Waiting for 0 items and 0 threads to finish executing 13/01/22 17:49:45 INFO mapred.TaskRunner: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting 13/01/22 17:49:45 INFO mapred.LocalJobRunner: 13/01/22 17:49:45 INFO mapred.TaskRunner: Task attempt_local_0001_r_000000_0 is allowed to commit now 13/01/22 17:49:45 INFO mapred.LocalJobRunner: Done > reduce 13/01/22 17:49:45 INFO mapred.TaskRunner: Task 'attempt_local_0001_r_000000_0' done. 13/01/22 17:49:46 INFO mapred.JobClient: map 100% reduce 100% 13/01/22 17:49:46 INFO mapred.JobClient: Job complete: job_local_0001 13/01/22 17:49:46 INFO mapred.JobClient: Counters: 12 13/01/22 17:49:46 INFO mapred.JobClient: FileSystemCounters 13/01/22 17:49:46 INFO mapred.JobClient: FILE_BYTES_READ=483521 13/01/22 17:49:46 INFO mapred.JobClient: FILE_BYTES_WRITTEN=519032 13/01/22 17:49:46 INFO mapred.JobClient: Map-Reduce Framework 13/01/22 17:49:46 INFO mapred.JobClient: Reduce input groups=2 13/01/22 17:49:46 INFO mapred.JobClient: Combine output records=0 13/01/22 17:49:46 INFO mapred.JobClient: Map input records=4 13/01/22 17:49:46 INFO mapred.JobClient: Reduce shuffle bytes=0 13/01/22 17:49:46 INFO mapred.JobClient: Reduce output records=2 13/01/22 17:49:46 INFO mapred.JobClient: Spilled Records=4 13/01/22 17:49:46 INFO mapred.JobClient: Map output bytes=86 13/01/22 17:49:46 INFO mapred.JobClient: Combine input records=0 13/01/22 17:49:46 INFO mapred.JobClient: Map output records=2 13/01/22 17:49:46 INFO mapred.JobClient: Reduce input records=2

Thanks!

pereferrera commented 11 years ago

Hello,

I have committed something which may fix this issue... it has to do with relative / absolute file paths. Can you git pull and try now?

dongpf commented 11 years ago

I pull the patch and rerun the testcase, it passes.

Thanks!