digicademy / xtriples

A generic webservice to extract RDF statements from XML resources
http://xtriples.lod.academy
MIT License
18 stars 3 forks source link

Caching for remote files during extraction #2

Open awagner-mainz opened 8 years ago

awagner-mainz commented 8 years ago

When resources are fetched from remote servers, download them once and use the cached version instead of re-downloading them every time.

(Code is operational at my place (TM), move it to git branch and test it!)

Think about:

metacontext commented 4 years ago

To keep cache pruning simple I think it would suffice to add a slim cleanup function that gets called last on return of the main query. All resources fetched for a specific extraction could be saved to a subfolder in /temp named with a temporary identifier. The cleanup function could then delete the whole cache folder with a xmldb:remove during return.

metacontext commented 4 years ago

Just to keep all thoughts we had about this feature so far here is a related snippet from my TODO.txt:

Only load external resources once => retrieve and cache them for all following statements maybe by checking the configuration first for all resource attributes, load them and temp. put them in DB with URI <=> hashname map file: cache/extraction_1235612rauafd.xml

<cachedResources>
<resource key="http://my.gnd.resource/12345" value="temp/13717dhgqf.xml" />
</cachedResources>