ContentMine / cproject

ArgProcessor and files for basic CMDirectories. Often subclassed. Needs to be separate from euclid and norma
Apache License 2.0
0 stars 4 forks source link

structure of results.xml #13

Open tarrow opened 8 years ago

tarrow commented 8 years ago

Just trying to get an idea of what should actually be in results.xml. Currently we turn out snippets like this:

Word Frequency

<?xml version="1.0" encoding="UTF-8"?>
<results title="frequencies">
 <result title="frequency" word="malaria" count="72"/>
</results>

Binomial Species

<?xml version="1.0" encoding="UTF-8"?>
<results title="binomial">
 <result pre=" species. In Madagascar, bimonthly treatment with the anthelmintic levamisole had no effect on " exact="Plasmodium falciparum" xpath="/html[1]/body[1]/div[2]/div[2]/div[3]/div[3]/p[1]" match="Plasmodium falciparum" post=" parasite density among children aged &amp;amp;lt;5 years but, among children aged ≥15 years, resulted " name="binomial"/>
</results>
petermr commented 8 years ago

All files should contain audit/log metadata. This could be something like:

<metadata rundate="2016-04-29" query="species binomial" program="ami_0.3.1" os="macosx.10.2" ... etc inputSteam="..." stemming="true" caseSensitive="no"

On Fri, Apr 29, 2016 at 8:00 AM, tarrow notifications@github.com wrote:Just trying to get an idea of what should actually be in results.xml. my edits

Word Frequency

<?xml version="1.0" encoding="UTF-8"?>

``` ```

Binomial Species

<?xml version="1.0" encoding="UTF-8"?>

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/ContentMine/cmine/issues/13

Peter Murray-Rust Reader in Molecular Informatics Unilever Centre, Dep. Of Chemistry University of Cambridge CB2 1EW, UK +44-1223-763069