INL / corpus-frontend

BlackLab Frontend, a feature-rich corpus search interface for BlackLab.
16 stars 7 forks source link

Select type on metadata properties doesn't always work #231

Closed jvdzwaan closed 5 years ago

jvdzwaan commented 5 years ago

I want dropdown menu's for basically all metadata fields of my corpus-frontend instance. But even though I specify uiType: select, sometimes the field is displayed as a combobox.

As you can see, the short title field is a combobox, and author aka a select (as it should be):

interface

The metadata specification is the same:

yaml

The full yaml file can be found here: https://github.com/arabic-digital-humanities/corpus-blacklab-metadata-config/blob/master/fiqh.yaml

(The AuthorAKA field is the author field and the BookTITLE_SHORT field is the title field.)

But the data returned by blacklab server shows a difference:

<metadataField name="BookTITLE_SHORT">
    <fieldName>BookTITLE_SHORT</fieldName>
    <isAnnotatedField>false</isAnnotatedField>
    <displayName>Short Title</displayName>
    <description>Short title of the document</description>
    <uiType>select</uiType>
    <type>UNTOKENIZED</type>
    <analyzer>default</analyzer>
    <unknownCondition>NEVER</unknownCondition>
    <unknownValue>unknown</unknownValue><displayValues/><fieldValues/>
    <valueListComplete>false</valueListComplete>
</metadataField>
...
<metadataField name="AuthorAKA">
    <fieldName>AuthorAKA</fieldName>
    <isAnnotatedField>false</isAnnotatedField>
    <displayName>Author AKA</displayName><description/>
    <uiType>select</uiType>
    <type>UNTOKENIZED</type>
    <analyzer>default</analyzer>
    <unknownCondition>NEVER</unknownCondition>
    <unknownValue>unknown</unknownValue><displayValues/>
    <fieldValues>
        <value text="أبو يوسف">2</value>
        <value text="إبن بابويه">2</value>
        <value text="مالك">2</value>
    </fieldValues>
    <valueListComplete>true</valueListComplete>
</metadataField>

Any idea why this happens and how to fix this?

KCMertens commented 5 years ago

It's the `false that's causing this. Blacklab encountered too many unique values and didn't store them all in the index metadata, so we don't have the complete list of values available, in this case we fall back to autocomplete fields so that you can still search for all values. (see https://github.com/INL/BlackLab/issues/85). It's probably a good idea to print a warning when we do this though...

It also seems there were no <fieldValues> captured at all for BookTITLE_SHORT. That might point to an error in your index config as it hasn't indexed anything for that field (unless you just removed that part from the snippet). Still a little strange that it sets the valuelistcomplete to false in that case though. Paging @jan-niestadt is that correct?


As a side note - do the dropdowns work adequately with rtl text? I've rewritten them due to some quirks with the plugin we used for them, so I'm interested if they do the job for exotic contents.

jvdzwaan commented 5 years ago

Thanks for your reply!

I indexed only three files for this example, so there are three titles and three authors. So it is really a mystery to me why BlackLab decides the author can be a dropdown, but the title is a combobox.

Can we manually set <valueListComplete>true</valueListComplete> in the indexing configuration? That would probably solve this problem.


About the dropdowns, they work with Arabic text, but I'm not sure we are using the refactored version. When were the changes introduced?

jvdzwaan commented 5 years ago

I don't understand your remark about an error in the config, because there were no fieldValues captured for BookTITLE_SHORT. The config for both AuthorAKA and BookTITLE_SHORT are the same. Or am I missing something?

KCMertens commented 5 years ago

I don't understand your remark about an error in the config, because there were no fieldValues captured for BookTITLE_SHORT. The config for both AuthorAKA and BookTITLE_SHORT are the same. Or am I missing something?

Looking at the config, you're indexing meta fields that look something like <meta name="BookTITLE">title here</meta> correct? It seems like no field with name="BookTITLE_SHORT" is found in your documents, so no values are known. It's my guess that blacklab leaves <valueListComplete> false in this case (though I'm not 100% on this), ergo no dropdown can be generated.

You can absolutely go in and change <valueListComplete> by hand, but there's still no known values, so you'll just end with an empty dropdown unless you also manually add the <value/> list for that field.

About the dropdowns, they work with Arabic text, but I'm not sure we are using the refactored version. When were the changes introduced?

You might not be seeing the latest version then yet, I think the newer dropdowns are around a month old and haven't landed in a release yet.

jvdzwaan commented 5 years ago

Thanks, you were correct, there were no <meta name="BookTITLE_SHORT">title here</meta> in the files I indexed. My mistake. So, this now works.

But there is still something strange going on. There is another field, AuthorName, which in the configuration yaml has exactly the same specifications as BookTITLE_SHORT, but is not rendered as a dropdown. If I look at the XML returned by Blacklab, it says <valueListComplete>false</valueListComplete>.

So how does BlackLab decide the value list is complete or not? Can I somehow specify that? Because now for AuthorAKA the value list is complete, and AuthorName it is not, They have exactly the same number of entries. The only difference I see is that an Author Name generally consist of longer strings (let's say of about 10 words), while AuthorAKA consists of strings of 1 - 3 words.

KCMertens commented 5 years ago

Does the AuthorName field hit the (default) limit of 50 unique values? At that point valueListComplete is set to false and new values are omitted from the list.

If that happens you can either raise the limit (maxMetadataValuesToStore in the BlackLab config) until you don't hit it or set the field's uiType to combobox so it fetches values at runtime instead of trying to rely on a static list.

EDIT: set the uiType to combobox, not autocomplete.

KCMertens commented 5 years ago

Sorry to report combobox is broken and may have been for a little while, fixing now.

jvdzwaan commented 5 years ago

Both AuthorName and AuthorAKA have 54 unique values. We have raised the number of values to display in a select field. It works for AuthorAKA, but not for AuthorName, and I was wondering why.

But if you found something is wrong with the comboboxes, that may be exactly the problem I was running into...

KCMertens commented 5 years ago

Fixed the combobox in https://github.com/INL/corpus-frontend/commit/9ea86e665ddb0b11372bc88251979bfeb2c934a6

It's still strange that no dropdown is being rendered though, that's something else than the combobox issue. Any chance you could attach the output of JSON.stringify(vuexStore.state) in your index?

jvdzwaan commented 5 years ago

Here is the output (converted to text): test.txt

KCMertens commented 5 years ago

I forgot we purge incomplete value lists from the data in the frontend... I'm afraid that output doesn't tell me that much, except that the list was indeed purged and no dropdown is being rendered.

Just a sanity check: you raised the limit before indexing the documents (and indexed into a new corpus - otherwise blacklab would preserve the incomplete flag)? Otherwise I can't think of a sane reason for why this would happen, I'll try and index some test data when I get some time.

jvdzwaan commented 5 years ago

Yes, I'm sure the limit was raised before indexing. I counted the number of entries of some of the dropdowns and they are above 50 (as expected).

So, my hunch is that it might have to do with the number of words in/length of the metadata value. We have 2x2 matching metadata fields: full title/short title and full author name/short author name, and in both instances the short values are displayed in a dropdown and the long values aren't.

For the record: since our users prefer to use the short values anyway, my immediate problem is solved.

jan-niestadt commented 5 years ago

If you send me the XML data, I can have a look at the difference in valueListComplete between the two fields. Value length shouldn't affect that, BTW.

jvdzwaan commented 5 years ago

I'm sending the data via wetransfer.

jan-niestadt commented 5 years ago

Thanks. I've had a look and it turns I was wrong and you were right: if a value longer than 100 characters is encountered, BlackLab decides not to store it, and sets valueListComplete to false. A bit surprising behaviour (even to me...), but the reason it's done is to prevent the indexmetadata file from growing too large. I will increase the maximum value length to 256 for now; I feel like that should be enough for most cases, but if it's really necessary I could add a configuration option. I will also log a warning when the limit is exceeded.

jan-niestadt commented 5 years ago

See https://github.com/INL/BlackLab/commit/35bc69376d2d99cb1fdf9721bb82819d3e83eabe

jvdzwaan commented 5 years ago

Thanks! It makes sense to set a maximum length, and having a warning would also be very helpful.

I think this issue can be closed.