ContentMine / ami

Apache License 2.0
13 stars 14 forks source link

ami2-regex ERROR org.xmlcml.ami2.plugins.MatcherResult #21

Open chreman opened 9 years ago

chreman commented 9 years ago

The regex.xml looks like this (after the example here)

<compoundRegex title="dinosaurfood">
<regex weight="1.0" fields="food pre word post">((.{1,50})([Ff]ood)(.{1,50}))</regex>
<regex weight="1.0" fields="sustentation pre word post">((.{1,50})([Ss]ustentation)(.{1,50}))</regex>
</compoundRegex>

errors look like this: - groupList is len 6, fieldList is len 4

548519 [main] ERROR org.xmlcml.ami2.plugins.MatcherResult  - groupList (6; [a significant episode in the evolution of terrestrial biotas (125–80 Ma) in which the taxonomic div, ersification of angiosperms and the resulting new food resources spurred co-evolutionary radiations of, ersification of angiosperms and the resulting new , food,  resources spurred co-evolutionary radiations of, insects and some terrestrial vertebrates (e.g., herbivorous dinosaurs; Lloyd et al. 2008). Wilson e]) does not match fieldList (4;[food, pre, word, post])
556129 [main] ERROR org.xmlcml.ami2.plugins.MatcherResult  - groupList (6; [medium-sized and large individuals, indicates important niche partitioning between these carnivorou, s dinosaurs. The top predators at the acme of the food chain were represented by three large theropods,, s dinosaurs. The top predators at the acme of the , food,  chain were represented by three large theropods,, Lourinhanosaurus, Ceratosaurus and Allosaurus, and a very large form, Torvosaurus, functionally and]) does not match fieldList (4;[food, pre, word, post])
572041 [main] ERROR org.xmlcml.ami2.plugins.MatcherResult  - groupList (6; [With a minimum length of 612 mm, the maxilla of Torvosaurus gurneyi pertains to a very large indivi, dual positioned at the apex of the food chain in the Late Jurassic ecosystem of Iberia., dual positioned at the apex of the , food,  chain in the Late Jurassic ecosystem of Iberia., The maxilla occupies 52% ( Allosaurus) to 61% ( Yangchuanosaurus) of the skull length in the larges]) does not match fieldList (4;[food, pre, word, post])
chreman commented 9 years ago

Found the error, had to remove pre and post from the regex.xml

<compoundRegex title="dinosaurfood">
<regex weight="1.0" fields="food">([Ff]ood)</regex>
<regex weight="1.0" fields="sustentation">([Ss]ustentation)</regex>
</compoundRegex>

and add it with --context 50 50

petermr commented 8 years ago

Another aspect of testing/cleaning regex