BayanGroup / nutch-custom-search

65 stars 34 forks source link

Using fragment on xml documents #34

Open rohith004 opened 8 years ago

rohith004 commented 8 years ago

I am trying to parse http://exporter.nih.gov/XMLData/final/RePORTER_PRJABS_X_FY2016_024.zip which contains one xml file with "PROJECTS" as the root. i want to index multiple documents from this xml file with /PROJECTS/row as the new root and /PROJECTS/row/APPLICATION_ID as one of the field. my extractor.xml file looks like this `

` This fails always trying to index leaving the logs Parser

2016-03-17 21:49:43,545 INFO parse.ParseSegment - Parsed (5ms):9095546 9024405 ..and so on 2016-03-17 21:49:43,546 INFO parse.ParseSegment - Parsed (1ms):file:/home/ubuntu/nih/RePORTER_PRJ_X_FY2016_024.xml

Indexer

org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://localhost:8983/solr/nih_core: ERROR: [doc=9095546 9024405..] multiple values encountered for non multiValued field doc_ref: [some constant, some constant...]