clarin-eric / VLO

Virtual Language Observatory
GNU General Public License v3.0
14 stars 6 forks source link

Cache vocabulary map in PostProcessorsWithVocabularyMap #68

Closed twagoo closed 7 years ago

twagoo commented 7 years ago

The eu.clarin.cmdi.vlo.importer.PostProcessorsWithVocabularyMap postprocessor implementation reads uniform mapping files for various "vocabularies" during import. There is no caching even though these files don't change (and should not during an import), so caching base on the map's URL would be helpful and sensible here.

wowasa commented 7 years ago

since I don't know the amount of memory available I'm wondering whether it's wiser to create a local file-cache or whether we can cache the mapping f.e. in a singleton

twagoo commented 7 years ago

it should be safe to cache this in memory, there are not that many maps

wowasa commented 7 years ago

just added modification of eu.clarin.cmdi.vlo.importer.PostProcessorsWithVocabularyMap to development branch. Class is now cashing the mappings in a static Hashtable with Url as key

twagoo commented 7 years ago

Added missing constructor in aa6bcd3df7c42fab7dceca4106b1d5f93a673c67

twagoo commented 7 years ago

I will merge this into the feature branch for #29, and will probably make an adaptation to ensure thread safety - I think switching from Hashtable to ConcurrentHashMap should do the trick. The former class has the deprecated status BTW.

twagoo commented 7 years ago

Merged into issue29 branch: 5b65d4a3718d406788e39ef25bda793a4a539dee Adapted for concurrency: a62d9f1012a5b88286e6551776786d85bbdd1071

wowasa commented 7 years ago

just for curiosity: is it necessary to consider thread safety since the key/value pair is only loaded once into the Hashtable and never modified?

twagoo commented 7 years ago

Two threads may get to the point of loading at the same time. Except that the method from which this is called is synchronised, so in this case maybe it's not necessary. But getMappingFromFile is protected, so extending classes could call it as well.