Capitains / Nautilus

Implementation of a local CTS5 endpoint for MyCapytains
http://capitains-nautilus.readthedocs.io/en/latest/
Mozilla Public License 2.0
6 stars 4 forks source link

Implement a resolver with a second backend for collections #57

Open PonteIneptique opened 7 years ago

PonteIneptique commented 7 years ago

Currently, the only resolver we have has a backend directly read from XML or from cache.

This new resolver should :

sonofmun commented 7 years ago

Some Notes:

sonofmun commented 7 years ago

More information on the current process:

The set up :

  1. The resolver with resources that need parsing are declared in https://github.com/OpenGreekAndLatin/leipzig_cts/blob/master/modules/capitains/templates/app.py.erb#L75-L79
  2. The inventory is actually built with this information in https://github.com/OpenGreekAndLatin/leipzig_cts/blob/master/modules/capitains/templates/update_capitains_repos.rb.erb#L68-L71 : everytime our corpora change, we rebuild some of the cache : we do a parse to get the inventory in cache
  3. parse is called by the manager, which goes into every xml file (text or metadata) to build some of the information needed : https://github.com/Capitains/Nautilus/blob/master/capitains_nautilus/cts/resolver.py#L161-L258
  4. App just calls the resolver in most of its queries . It's basically the core of the app.

Workflow when running

  1. Anytime we need to access metadata (name of a text, citation scheme, text itself) we hit the inventory.
  2. The inventory, defined here https://github.com/Capitains/Nautilus/blob/master/capitains_nautilus/cts/resolver.py#L87-L91,
  3. if it dropped from cache, it will ask to reparse the whole thing . It is most likely the reason for 502 because this can take a really long time for a normal process (ie it should not be the case)
sonofmun commented 7 years ago

All these new functions should be unit tested.

PonteIneptique commented 6 years ago

Partially implemented in #68