Closed stcoats closed 1 year ago
You're trying to use the linked documents feature to link audio files, but this won't work. This feature is useful when you have some external XML or CSV file with metadata like title, author, etc. that you want to apply to a document while indexing.
If you just want a play button in the full document view (generated from XML using XSLT) that plays an external audio file, you should be able to do that just by crafting the XSLT to determine the correct URL for the audio file for a sentence, paragraph, etc.
If you want to show a play button when you click on a hit in the results view, you will need to index any required information (audio file name/number, start/end timecodes) as annotations for each word, so that your custom.js can determine how to play the correct audio for any hit. (I take it you've already seen https://github.com/INL/corpus-frontend#custom-js ?)
Here's the relevant part of our OpenSonar format config file (.blf.yaml):
annotatedFields:
contents:
wordPath: .//folia:w
# If specified, a mapping from this id to token position will be saved, so we
# can refer back to it for standoff annotations later.
tokenPositionIdPath: "@xml:id"
annotations:
- name: word
valuePath: folia:t
# Store part of the xml:id attribute so we can find the corresponding audio file
# (xml id contains the document and sentence ids, which identifies the audio file)
- name: _xmlid
valuePath: "@xml:id" # NOTE: xml:id of w tag
isInternal: true
process:
- action: replace
find: "^[^\\.]*\\.(.*)$" # find first .
replace: "$1" # keep everything after that
# Separate standoff annotations give the begin and end time for each word.
# We refer back to the tokenPositionIdPath captured above so they are indexed at the correct position.
standoffAnnotations:
- path: //timesegment # Element containing the values to index
refTokenPositionIdPath: wref/@xml:id # What token position(s) to index these values at
annotations: # Annotation(s) to index there
- name: begintime
valuePath: ../@begintime
isInternal: true
- name: endtime
valuePath: ../@endtime
isInternal: true
And here's a small excerpt from the XML data:
<s speaker="BACKGROUND" xml:id="fn007233.1">
<w xml:id="fn007233.1.1">
<t>achtergrondmuziek.</t>
<pos class="SPEC(comment)" head="SPEC">
<feat class="comment" subset="spectype"/>
</pos>
<lemma class="_"/>
</w>
<timing>
<timesegment begintime="00:00:00.000" endtime="00:05:29.048">
<wref id="fn007233.1.1" t="achtergrondmuziek."/>
</timesegment>
</timing>
</s>
Thank you, once again, for such a speedy response! The https://github.com/INL/corpus-frontend#custom-js page seems to be exactly what I need. It will take some time for me to figure out how this works, so I will follow up once I have something working.
Big reply, but you've picked probably the most complex thing you can do in the frontend, strap in! (I'm more than aware this isn't ideal)
The reason it's a little involved is twofold;
First, index the relevant info about the audio file in BlackLab.
It seems your audio is one file per document, so we'll store the audio file name in the document metadata.
The example from our OpenSonar config stores it per word (as annotations begintime
and endtime
), as we have more precise info.
metadata:
- containerPath: document
fields:
- name: audiofile
isInternal: true # this doesn't do anything in BlackLab, but prevents the corpus-frontend from showing a filter for this field
valuePath: ./externalMetadata/@id
Now to configure the frontend, we'll need to add a custom snippet of javascript.
corporaInterfaceDataDir
in your corpus-frontend.properties
static
)
static
is a special dir that the corpus-frontend can serve files from. So you can store various clientside config files and assets here. Available under /corpus-frontend/${corpus_id}/static/...
/corporaInterfaceDataDir
/my_corpus_name
search.xml
article.xsl
/static
custom.js
In search.xml
, add the following line, importing your newly created custom.js
file on the search page:
<CustomJs page="search">${request:corpusPath}/static/custom.js</CustomJs>
Now edit your custom.js
and configure the plugin that renders the audio button.
There's a little more documentation in the Readme, if you need it search for audio player
.
But for your example this should work.
/* The context object contains the following information:
{
corpus: string,
docId: string, // the document id
snippet: BLTypes.BLHitSnippet, // the raw hit info as returned by blacklab
document: BLTypes.BLDocInfo, // the document metadata (just a key-value map of all metadata, values contained in arrays!)
documentUrl: string, // url to view the document in the corpus-frontend
wordAnnotationId: string, // configured annotation to display for words (aka vuexModules.ui.results.hits.wordAnnotationId)
dir: 'ltr'|'rtl',
citation: {
left: string;
hit: string;
right: string;
}
}
The returned object should have the following shape:
{
name: string; // unique name for the widget you're rendering, can be anything
component?: string; // (optional) name of the Vue component to render, component MUST be globally installed using vue.component(...)
element?: string; // when not using a vue component, the name of the html element to render, defaults to 'div'
props?: any; // attributes on the html element (such as 'class', 'tabindex', 'style' etc.), or props on the vue component
content?: string // html content of the element, or content of the default slot when using a vue component
listeners?: any; // event listeners, passed to v-on, so 'click', 'hover', etc.
}
*/
vuexModules.ui.getState().results.hits.addons.push(function(context) {
return {
component: 'AudioPlayer', // don't change this!
name: 'audio-player', // this may be whatever
props: {
docId: context.docId, // for caching
startTime: 0,
endTime: Number.MAX_SAFE_INTEGER, // since we don't have a defined endtime, just set a high number
url: `${your_cdn}/${context.document.audiofile[0]}`
},
}
})
Now edit your newly created article.xsl
.
Explaining XSLT is a little out of scope for this, but luckily BlackLab can generate a basic setup:
Make sure the .blf.yaml
file you used to index your corpus is loaded in blacklab-server
or it'll 404. You might have to edit BlackLab's config file to do that.
Then go to http://localhost:8080/blacklab-server/input-formats/${my_format_name}/xslt and save it as article.xsl
.
You can then add the snippet that renders the play button.
We use the following setup for OpenSonar (see the snippet Jan posted above for the corresponding XML).
You'll have to edit it to match your document structure.
<xsl:variable name="audiofile" select=".//externalMetadata/@id"/>
<xsl:variable name="begintime" select="'0'"/>
<xsl:variable name="endtime" select="'999999'"/>
<button type="button" class="btn btn-sm btn-default audio-button">
<xsl:attribute name="data-audio-start"><xsl:value-of select="$begintime"/></xsl:attribute>
<xsl:attribute name="data-audio-end"><xsl:value-of select="$endtime"/></xsl:attribute>
<xsl:attribute name="data-audio-file"><xsl:value-of select="$audiofile"/><xsl:attribute>
<span class="fa fa-play"></span>
</button>
Then we have a corresponding javascript file that brings it to life, I've attached it: article_enable_audio.js.txt
You'll have to edit that, since we share 1 audio file over many play buttons, so there's some caching involved in the script. And we have timing information. But it should be enough to get you started.
Anyway, stick that js file in the static
dir and import it on the article view page.
search.xml
: <CustomJs page="article">${request:corpusPath}/static/article_enable_audio.js</CustomJs>
Wow, thank you for this detailed reply! It will take me some time to go through this and try things out.
A question: I already have javascript code that can fetch a something from a cdn, render an audio player, and play the file on an html page. Is there a way to put this into the html-rendered search.xml page, so as not to have to deal with XLST and XSL scripts?
Or even better, render the search interface directly in HTML, use scripts to show the hits and context, etc. and forego all XML, XLST, and XSL?
I'm not sure I completely understand what you want.
But sure, you can just add whatever javascript you want on any page by adding the <CustomJS>
tag insearch.xml
. No further config needed.
But how will you know what audio file to play for what hit? Also, the search page is very dynamic, html changes constantly when you perform a new search, or load another page of results. So if you insert an audio player somewhere in the table, it will be thrown away when new results are shown (in the best case). You will have to take care of disposing old audio players, create new ones, etc.
For when viewing a single document it'll probably work though. That page is a lot less complex so you just get some html and it won't change.
My structure is basically
<document>
<metadata>
...
</metadata>
<s link = "link1">
<w xml:id="w.1" pos="PRP" lemma="I">i</w>
<w xml:id="w.2" pos="VBD" lemma="feel">felt</w>
<w xml:id="w.3" pos="JJ" lemma="good">good</w>
</s>
<s link = "link2">
<w xml:id="w.4" pos="PRP" lemma="you">you</w>
<w xml:id="w.5" pos="VBD" lemma="feel">felt</w>
<w xml:id="w.6" pos="JJ" lemma="bad">good</w>
</s>
</document>
I am hoping that for each hit on a word/lemma/pos, I can insert an audio player below the hit, using the link in the <s>
element for the sentence that word is in. I will experiment with doing it on the hits page and the document page.
Thank you once again for your quick responses! You and Jan are super helpful. 😊
I can't get a custom search.xml
page working for a corpus I created called test
. I copied the search.xml
from opt/tomcat/apache-tomcat-9.0.78/webapps/corpus-frontend-3.1.0/WEB-INF/classes/interface-default
, then made a few changes to it, and put it in
/corporaInterfaceDataDir
/test
search.xml
/static
I then changed the corpus-frontend-3.1.0.properties
file by commenting out corporaInterfaceDataDir=/etc/blacklab/projectconfigs/
and adding corporaInterfaceDataDir=/etc/blacklab/corporaInterfaceDataDir/test/
.
When I restart the frontend, it still uses the default search.xml
. What am I doing wrong?
adding corporaInterfaceDataDir=/etc/blacklab/corporaInterfaceDataDir/test/.
Remove the trailing test/
and it should work for you :)
Thanks, that works! I'm using customJS to make some minor changes as per the instructions at https://github.com/INL/corpus-frontend#custom-js. Most javascript seems to work, including example vue.js functions provided on the page. I can't get "Customize the display of document titles in the results table" to work.
Here's my metadata structure:
<?xml version="1.0" ?>
<root>
<document>
<metadata id="id1">
<meta name="video_title">test_video1</meta>
(other metadata fields)
</metadata>
...
I tried putting the default
vuexModules.ui.getState().results.shared.getDocumentSummary = function(metadata, specialFields) {
return 'The document is: ' + metadata[specialFields.titleField][0];
}
and
vuexModules.ui.getState().results.shared.getDocumentSummary = function(metadata, specialFields) {
return 'The document is: ' + metadata[specialFields.video_title][0];
}
in the function, but on the page it displays
The document is: /path/to/indexed/xml/files/test_xml_1.xml
above each snippet.
Getting it directly from the metadata object should work: 'The document is: ' + metadata.video_title[0]
.
The second snippet probably crashes, and that's why you don't see anything happen, check the javascript console and it'll probably show an exception.
SpecialFields only contains the names of fields in metadata
, not the actual metadata itself: check it out
Sorry to keep asking questions! I can't get these examples at https://github.com/INL/corpus-frontend#custom-js to work: "A table with whatever data you wish to show", "A pie chart displaying the frequency of an annotation's values", and "A graph showing growth of annotations in the document".
I copy-pasted the three code blocks into custom.js
, but nothing happens. The console says
custom.js?_1378044326:39 Uncaught TypeError: vuexModules.root.actions.distributionAnnotation is not a function
at custom.js?_1378044326:39:26
Just including one of the code blocks also fails, with a similar error message in the console.
I am probably overlooking something obvious!
You're probably running the same script on both the search
and docs
page, right?
The pages don't have the same customization options, so that function will be undefined on one of the two pages. You'll have to add checks, or split up your custom.js into two scripts and only include them on their specific page.
Here's how to do that:
custom.search.js
(for the /search page) and custom.article.js
(for the /docs/... page). Adding the snippet for the document stuff in custom.article.js
. <CustomJs page="search">${request:corpusPath}/static/js/custom.search.js</CustomJs>
<CustomJs page="article">${request:corpusPath}/static/js/custom.article.js</CustomJs>
If it still crashes, let me know, in that case I'll need to investigate a little deeper.
Thanks, that worked.
Regarding the attempt to play an audio file in the results table for each hit, I discarded the first proposed xml structure I suggested on the basis of Jan's response:
you will need to index any required information (audio file name/number, start/end timecodes) as annotations for each word
The longer reply you wrote then explains how one might set things up for a link in the externalMetadata
field to one audio file per document. However, I have many audio files per document, and I re-wrote my file converter to restructure the xml files. Now, in the structure, below, each <s>
tag corresponds to one short audio file stored at a cdn, which contains all of the words in that sentence. So, for example, id1
would be the identifier for a .wav of the speaker saying "I felt good", and id2
for a clip of the speaker saying "You felt good", and so on.
<document>
<metadata>
...
</metadata>
<text>
<s id = "id1">
<w xml:id="w.1" pos="PRP" lemma="I">i</w>
<w xml:id="w.2" pos="VBD" lemma="feel">felt</w>
<w xml:id="w.3" pos="JJ" lemma="good">good</w>
</s>
<s id = "id2">
<w xml:id="w.4" pos="PRP" lemma="you">you</w>
<w xml:id="w.5" pos="VBD" lemma="feel">felt</w>
<w xml:id="w.6" pos="JJ" lemma="bad">good</w>
</s>
</text>
</document>
Is it possible to adapt the strategy you suggested for a structure like this? I don't need start and end times for the audio clips, at least for now, because I would like to play the audio for the entire sentence for a hit on any word in that sentence. I would like https://mycdn.com/path/id1.wav
to be retrieved for the words with xml:id=
"w.1" or "w.2" or "w.3", and https://mycdn.com/path/id2.wav
to be retrieved for the ids "w.4", "w.5", "w.6", etc.
I know almost nothing about XSLT, but could the line in your example <xsl:variable name="audiofile" select=".//externalMetadata/@id"/>
be changed to <xsl:variable name="audiofile" select="../s/@id"/>
or something along those lines, i.e. grab the parent element of the word hit and get the id attribute from that?
As far as I understood, Jan's suggestion was to re-do the xml files to have the annotation for the audio file included for each word. For my data, that would look something like this:
<s id = "id1">
<w xml:id="id1.w.1" pos="PRP" lemma="I">i</w>
<w xml:id="id1.w.2" pos="VBD" lemma="feel">felt</w>
<w xml:id="id1.w.3" pos="JJ" lemma="good">good</w>
</s>
<s id = "id2">
<w xml:id="id2.w.4" pos="PRP" lemma="you">you</w>
<w xml:id="id2.w.5" pos="VBD" lemma="feel">felt</w>
<w xml:id="id2.w.6" pos="JJ" lemma="bad">good</w>
</s>
</text>
</document>
A possible problem with this is that the identifying annotation codes for the audio files are quite long (e.g. f15-GX8-qszPE_0003301000290812_127
), and each sentence can contain many words. Would this not make the size of the index (and entire installation) significantly greater? If not, because Lucene can handle that easily, then perhaps that is the best way to go?
Thanks once again for your willingness to help a neophyte with no development experience!
Putting the sentence id in every word id would work, but isn't necessary. Your suggestion to "grab" the sentence id from the <s/>
tag while indexing is the right approach, I think, and at first glance, the XPath expression ../s/@id
seems like it should work. Good luck!
Hello again.
I can't get an audio file audio to play if I use this javascript in custom.search.js
:
vuexModules.ui.getState().results.hits.addons.push(function(context) {
return {
component: 'AudioPlayer', // don't change this!
name: 'audio-player', // this may be whatever
props: {
docId: context.docId, // for caching
startTime: 0,
endTime: Number.MAX_SAFE_INTEGER, // since we don't have a defined endtime, just set a high number
url: `https://mycdn.com/${context.document.audiofile[0]}`
},
}
})
If I change url: `https://mycdn.com/${context.document.audiofile[0]}`
to a fixed url (like url: `https://mycdn.com/file1.mp3`
), it plays. I've tried ${context.document.audiofile}
, ${context.text.audiofile[0]}
, ${context.text.audiofile}
, ${context.audiofile}
, ${context.document.text.audiofile}
etc.
What should the variable be?
In my bfl.yaml I have
annotatedFields:
contents:
containerPath: text
wordPath: .//w
annotations:
- name: word
valuePath: .
sensitivity: sensitive_insensitive
- name: lemma
valuePath: "@lemma"
sensitivity: sensitive_insensitive
- name: pos
valuePath: "@pos"
- name: audiofile
valuePath: "../@id"
inlineTags:
- path: .//s
The code seems to be fetching the correct variable into the DOM. But I don't know what the correct variable is to get it out of there.
Instead of ${context.document.audiofile[0]}
, try ${context.snippet.match.audiofile[0]}
.
Looking at your blf.yaml
and your last screenshot, the audio file name is actually stored per-word, not in the document metadata. In that console log, anything in docInfos
is document metadata, anything in hits
is per-word information.
Note the context
object in javascript looks slightly different:
type context = {
corpus: string,
docId: string, // the document id
snippet: BLTypes.BLHitSnippet, // the raw hit info as returned by blacklab
document: BLTypes.BLDocInfo, // the document metadata (just a key-value map of all metadata, values contained in arrays!)
documentUrl: string, // url to view the document in the corpus-frontend
wordAnnotationId: string, // configured annotation to display for words (aka vuexModules.ui.results.hits.wordAnnotationId)
dir: 'ltr'|'rtl',
citation: {
left: string;
hit: string;
right: string;
}
}
Thanks for the detailed questions by the way, very helpful!
Awesome, thank you, and also for the structure of the context
object, I am learning a lot! With the change in the javascript function you suggested, the button now plays the desired clip in the results table.
However, if I play a hit, then move down in the table and click the play button for a different hit, it plays the same clip. If I refresh the page and go directly to the second hit, it plays the correct clip.
Is there a way to automatically refresh this, so that when staying on the same results page, one can play different clips?
Uh, woops, that's a bug! It seems we're caching audio players by their docId, which makes no sense, it should just cache based on the url. I'll get that fixed, but for now, you can work around that by just passing the url in props.docId.
Is this what you mean?
vuexModules.ui.getState().results.hits.addons.push(function(context) {
return {
component: 'AudioPlayer', // don't change this!
name: 'audio-player', // this may be whatever
props: {
docId: `https://mycdn.com/${context.snippet.match.audiofile[0]}`,
startTime: 0,
endTime: 999999999
},
}
})
If I do this, clicking on the play button doesn't play the audio.
vuexModules.ui.getState().results.hits.addons.push(function(context) {
return {
component: 'AudioPlayer', // don't change this!
name: 'audio-player', // this may be whatever
props: {
docId: `https://mycdn.com/${context.snippet.match.audiofile[0]}`,
url: `https://mycdn.com/${context.snippet.match.audiofile[0]}`,
startTime: 0,
endTime: 999999999
},
}
})
Perfect, thank you very much! I am still in the process of testing things out, but with the help of you and Jan I have established the basic functionality I need. I'll be back with more questions later. 😎
After a few initial hiccups, I've been having good success setting up a corpus using the instructions. Thank you for the good documentation!
I want to link text sentences to an audio file, like how has been done in the OpenSonar corpus. Say I have an audio file called my_wav.wav, stored at https://mycdn.com/my_wav.wav. My xml has this structure (simplified):
My .yaml file contains this:
This does not seem to be working, probably because I haven't quite understood what is going on. How can I achieve this?