clarin-eric / VLO

Virtual Language Observatory
GNU General Public License v3.0
14 stars 6 forks source link

CMDIDataProcessor.process(...) not thread-safe #367

Closed wowasa closed 1 year ago

wowasa commented 1 year ago

In an multi-threaded environment the method CMDIDataProcessor.process(...) returns a CMDIEntity object which differs from those in single-threaded environment at least in the Map, which is returned by CMDIdata.getDocument().

Hence, if the process method is supposed to work in a multi-threaded environment it is a bug. The issue is related to the curation-dashbord issue https://github.com/clarin-eric/curation-dashboard/issues/165

twagoo commented 1 year ago

We have not run into such concurrency problems with the VLO itself, which processes files in parallel as well. Therefore I would like to see it ruled out that the curation specific objects and service implementations passed to the constructor of CMDIParserVTDXML are not a cause of the concurrency issues.

twagoo commented 1 year ago

Regarding the apparent fix for https://github.com/clarin-eric/curation-dashboard/issues/165: an hypothesis to consider is that synchronising the call in the curation logic creates a bottleneck that 'fixes' (obscures) a concurrency problem elsewhere.

twagoo commented 1 year ago

The difference in behaviour in concurrent operation between VLO and curation is explained by the version of VTD-XML used. Won't fix in VLO 4.x

twagoo commented 1 year ago

Issue specific to VTD-XML upgrade -> #369