Closed twagoo closed 6 years ago
Another record with this problem: http://hdl.handle.net/11234/1-1508@format=cmdi (https://vlo.clarin.eu/record?docId=http_58__47__47_hdl.handle.net_47_11234_47_1-1508_64_format_61_cmdi)
Another record with this problem: http://hdl.handle.net/11234/1-1508@format=cmdi (https://vlo.clarin.eu/record?docId=http_58__47__47_hdl.handle.net_47_11234_47_1-1508_64_format_61_cmdi)
Note that this was reported by @stranak, and automatically turned into a ticket in the CLARIN-D support system. If we solve this or relevant information becomes apparent, we should report back there.
A subsequent import at vlo.clarin.eu fixed the problematic records. So somehow this mapping mistake/omission seems to have been incidental somehow. I have no suggestions for further investigation but we should keep an eye on this.
Problematic import was started on 2018-02-20 at 01:19 CET Import that fixed the state was started on 2018-02-21 at 13:48 CET
However, other records have a missing value for name now! See search results. This leads me to think it could be a concurrency issue.
OK, good to know it is not on our side.
Pavel
On 21 Feb 2018, at 14:27, Twan Goosen notifications@github.com wrote:
However, other records have a missing value for name now! See search results https://vlo.clarin.eu/search?q=-name:*&fqType=collection:or&fq=collection:Leipzig+Corpora+Collection. This leads me to think it could be a concurrency issue.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/clarin-eric/VLO/issues/147#issuecomment-367325253, or mute the thread https://github.com/notifications/unsubscribe-auth/AAWponDmo3s9lnbXa6Ew4kPBO1MNLyCGks5tXBnJgaJpZM4SNYfX.
Configuring the importer to run only a single processing thread:
<fileProcessingThreads>1</fileProcessingThreads>
takes away the issue. This is a strong indicator that this is a concurrency issue. Next step: see if this can be reproduced with older versions of the VLO.
An import on alpha-vlo.clarin.eu confirms that d7a43d75311a70e12e4d03175239d22f2579a833 fixes the issue. Will include this in a hotfix release which will be VLO 4.3.6 (beta deployment asap).
Note: beta currently has ~145k records without a title in its index. Reporting back after first import with vlo-4.3.6-beta1.
Note: beta currently has ~145k records without a title in its index. Reporting back after first import with vlo-4.3.6-beta1.
As of this morning the number of results for -name:*
is 62452 on beta. This confirms the fix.
Compare "Portuguese newspaper subcorpus from 2013 (por_news_2013_1M)" and "Unnamed record", both from the Leipzig Corpora Collection and based on the same profile (LCC_CorpusProfile).
The latter of these two appears without a name even though it has a value in the
LCC_Corpus/Name
element just like the former.Relates to Trac #1045.