Capitains / docker-capitains-nemo-nautilus

Debian Docker Image
GNU General Public License v3.0
0 stars 1 forks source link

Some files from GreekLit, LatinLit repositories don't pass and are not displayed #8

Open nevenjovanovic opened 8 years ago

nevenjovanovic commented 8 years ago

When I bring the Capitains docker image up, there is a large number of messages like this: 7} ERROR - /opt/data/canonical-greekLit-master/data/tlg0540/tlg014/tlg0540.tlg014.perseus-grc1.xml does not accept parsing at some level (most probably citation)

app-gunicorn stderr | [2016-07-04 10:34:07,030] {/usr/lib/python3.5/site-packages/capitains_nautilus/inventory/local.py:157} ERROR - /opt/data/canonical-greekLit-master/data/tlg0540/tlg014/tlg0540.tlg014.perseus-grc1.xml does not accept parsing at some level (most probably citation)

app-gunicorn stderr | [2016-07-04 10:34:07,032] {/usr/lib/python3.5/site-packages/capitains_nautilus/inventory/local.py:157} ERROR - /opt/data/canonical-greekLit-master/data/tlg0540/tlg014/tlg0540.tlg014.perseus-grc1.xml does not accept parsing at some level (most probably citation)

When I try to find some of the files that did not parse (e. g. urn:cts:greekLit:tlg0086.tlg010.perseus-grc1, Nicomachean Ethics), they are not present on the Nemo home page (e. g. http://localhost:8080/read/greekLit/tlg0086). This is, of course, normal and expected behaviour -- the unparsed files cannot be displayed.

I'd like to know, however, what is wrong with these files -- how are they different from the ones which parsed successfully? (I have a feeling this has to do with the Hooktest issue on URN information and naming conventions.)

PonteIneptique commented 8 years ago

Much much worse : they have not been converted at the moment. Have a look at this ( http://ci.perseids.org/repo/PerseusDL/canonical-greekLit/824e6f87-24e7-4234-ada0-559af002ff84 ) if you want, this will tell you exactly for each text what is wrong (sometime, everything is wrong ;) ). Generally, if it finishes by -grc1, -eng1 or -lat1, it is most likely not to be CapiTainS compliant.

balmas commented 8 years ago

@nevenjovanovic You might also want to take a look at the Perseus capitains docker environment (https://github.com/PerseusDL/capitains-environment) which uses a script (https://github.com/PerseusDL/capitains-environment/blob/master/hookclean.py) to weed out failing texts based upon the Hook CI results.

PonteIneptique commented 8 years ago

@balmas GO BACK TO YOUR HOLIDAYS, BOSS !

balmas commented 8 years ago

I'm going I'm going...