ContentMine / phylotree

A repository for ami-phylotree development
0 stars 0 forks source link

Separate characters and trees in pixel analysis #29

Open petermr opened 9 years ago

petermr commented 9 years ago

In

PhyloTreeArgProcessor.createNexmlAndTreeFromPixels(File inputImageFile) t
            phyloTreePixelAnalyzer = createAndConfigurePixelAnalyzer(image);
            diagramTree = phyloTreePixelAnalyzer.processImageIntoGraphsAndTree();
            LOG.debug("processImageIntoGraphsAndTree finished");

the processing spends a lot of time identifying and trying to process the characters. Tesseract provides bounding boxes which could be used to remove characters before processing tree. Alternatively we could try to identify small pixel Islands and not analyse them.