Open simulsys opened 8 years ago
Please can you give more information. To diagnose your problem we would need to know:
I don't see any scholarly.html, or fulltext.xml or cproject contents
Commands: getpapers --query 'pollination honey bees canola' --outdir test3 Result output: info: Searching using eupmc API info: Found 33 open access results Retrieving results [==============================] 100% (eta 0.0s) info: Done collecting results info: Saving result metadata info: Full EUPMC result metadata written to eupmc_results.json info: Individual EUPMC result metadata records written info: Extracting fulltext HTML URL list (may not be available for all articles) info: Fulltext HTML URL list written to eupmc_fulltext_html_urls.txt andrew@andrew-Dimension-5000 ~ $
cmine test3 andrew@andrew-Dimension-5000 ~ $ cmine test3 0 [main] DEBUG org.xmlcml.ami2.plugins.CommandProcessor - running NORMA -i fulltext.xml -o scholarly.html --transform nlm2html --project test3 !.!!!!!!!!!!.!!!!!!!!!!.!!!!!!!!!!.!!running: word([frequencies])[{xpath:@count>20}, {w.stopwords:pmcstop.txt stopwords.txt}] WS: test3 1752 [main] DEBUG org.xmlcml.ami2.wordutil.WordSetWrapper - symbol expands to: /org/xmlcml/ami2/wordutil/pmcstop.txt 1754 [main] DEBUG org.xmlcml.ami2.wordutil.WordSetWrapper - symbol expands to: /org/xmlcml/ami2/wordutil/stopwords.txt 1819 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC1892840 !1820 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract !.1822 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC2718223 !1823 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract !1824 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC2841636 !1824 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract !1826 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC2994710 !1826 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract !1828 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC3155332 !1828 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract !1829 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC3250423 !1829 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract !1831 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC3338325 !1831 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract !1833 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC3338563 !1833 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract !1835 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC3384620 !1835 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract !1838 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC3478041 !1838 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract !1850 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC3628874 !1850 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract !.1852 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC3655217 !1853 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract !1854 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC3806756 !1855 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract !1856 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC3817108 !1856 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract !1860 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC3869053 !1860 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract !1863 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC3958374 !1866 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract !1869 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC4046413 !1870 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract !1875 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC4053381 !1875 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract !1876 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC4217196 !1877 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract !1878 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC4284392 !1878 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract !1880 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC4284396 !1880 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract !.1881 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC4312970 !1881 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract !1883 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC4339550 !1883 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract !1885 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC4364903 !1885 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract !1886 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC4370578 !1887 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract !1888 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC4426341 !1889 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract !1890 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC4436261 !1891 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract !1893 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC4552548 !1893 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract !1895 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC4553434 !1896 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract !1898 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC4689365 !1898 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract !1900 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC4736462 !1900 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract !.1901 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC4796003 !1902 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract !1903 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - No scholarlyHtml or PDFTXT: test3/PMC4814072 !1903 [main] WARN org.xmlcml.ami2.plugins.word.WordCollectionFactory - no words found to extract !filter: word([frequencies])[{xpath:@count>20}, {w.stopwords:pmcstop.txt stopwords.txt}] frequenciesfrequencies....summary: word([frequencies])[{xpath:@count>20}, {w.stopwords:pmcstop.txt stopwords.txt}] C: frequencies....running: sequence([dnaprimer])[] ....filter: sequence([dnaprimer])[] dnaprimerdnaprimer....summary: sequence([dnaprimer])[] C: dnaprimer....running: gene([human])[] ....filter: gene([human])[] humanhuman....summary: gene([human])[] C: human....running: species([genus])[] SP: test3....filter: species([genus])[] genusgenus....summary: species([genus])[] C: genus....running: species([binomial])[] SP: test3....filter: species([binomial])[] binomialbinomial....summary: species([binomial])[] C: binomial....15967 [main] WARN org.xmlcml.ami2.plugins.ResultsAnalysis - Null pluginOption 15968 [main] WARN org.xmlcml.ami2.plugins.ResultsAnalysis - Null pluginOption 15969 [main] WARN org.xmlcml.ami2.plugins.ResultsAnalysis - Null pluginOption 15969 [main] WARN org.xmlcml.ami2.plugins.ResultsAnalysis - Null pluginOption 15971 [main] WARN org.xmlcml.ami2.plugins.ResultsAnalysis - Null pluginOption
Thanks for the help! I do not have input files, what are they?
You need to tell getpapers
to retrieve either -x
or -p
. Without that it creates CTrees but no fulltext.*
.
I am not sure whether this is a bug or a feature. I think it's reasonable to create CTrees with just the eupmc_result.json
in.
Hi there I am making progress. I narrowed down my search to 33 folders, but I do not seem to get analyzed results. Here is a snapshot of my directory: PMC4814072 14/05/16 08:26:44 File: commonest.dataTables.html 1 KB 14/05/16 08:26:44 File: count.dataTables.html 1 KB 14/05/16 08:26:44 File: entries.dataTables.html 1 KB 14/05/16 08:26:44 File: eupmc_fulltext_html_urls.txt 2 KB 14/05/16 08:26:04 File: eupmc_results.json 405 KB 14/05/16 08:26:04 File: full.dataTables.html 1 KB 14/05/16 08:26:44 File: gene.human.count.xml 1 KB 14/05/16 08:26:35 File: gene.human.documents.xml 1 KB 14/05/16 08:26:35 File: gene.human.snippets.xml 1 KB 14/05/16 08:26:35 File: sequence.dnaprimer.count.xml 1 KB 14/05/16 08:26:33 File: sequence.dnaprimer.documents.xml 1 KB 14/05/16 08:26:33 File: sequence.dnaprimer.snippets.xml 1 KB 14/05/16 08:26:33 File: species.binomial.count.xml 1 KB 14/05/16 08:26:44 File: species.binomial.documents.xml 1 KB 14/05/16 08:26:44 File: species.binomial.snippets.xml 1 KB 14/05/16 08:26:44 File: species.genus.count.xml 1 KB 14/05/16 08:26:40 File: species.genus.documents.xml 1 KB 14/05/16 08:26:40 File: species.genus.snippets.xml 1 KB 14/05/16 08:26:40 File: word.frequencies.count.xml 1 KB 14/05/16 08:26:32 File: word.frequencies.documents.xml 1 KB 14/05/16 08:26:32 File: word.frequencies.snippets.xml 1 KB 14/05/16 08:26:32
The 1KB files are all empty. What should they be, please? Plus I cannot find scholarly.txt?