EmmaSchwarz / computational-dostoevsky

1 stars 0 forks source link

Word count Goliadkin #25

Open blueciren opened 4 years ago

blueciren commented 4 years ago

I need help for the calculation of the word-count. It seems like working fine when I did experiment with a single element, such as metadata and novel. But when I select the more specific nodes, such as p[1] or speech[speaker='gol'], it gives a completely wrong number. Especially for the latter, (as you can find in the attached file in the development folder), it says 'an empty sequence is not allowed as variables of $word that I defined for a tokenized speech.

djbpitt commented 4 years ago

@blueciren I'm not sure whether I'll be able to look at this tonight (04-18), but if not, I'll follow up in the morning.

djbpitt commented 4 years ago

@blueciren I’m looking at the code now. I’m guessing that the file in question is Goliadkin_internal_external_dialogue_word-count.xsl, but in the future please specify the file when you post an issue.

djbpitt commented 4 years ago

@blueciren You’ve made the same XPath mistake in the following three lines:

<xsl:variable name="speeches"  select="//speech[speaker='gol']"/>
<xsl:apply-templates select="//speech[speaker='gol']"/>
<xsl:template match="//speech[speaker='gol']">

The error is that "speaker" is an attribute, and located on the attribute axis, which means that when you refer to it in the predicate, you need to write:

speech[@speaker='gol']

Note the @ before the name of the attribute. Without that, it looks for a child <speaker> element, and there aren’t any such elements in the document.

You also have an additional mistake in the last of the three examples above, where you set the value of a @match attribute to an XPath expression that begins with a double slash. What you want instead is an XPath pattern, that is:

<xsl:template match="speech[@speaker='gol']">

You need the double slash as the value of the @select attribute on <xsl:variable> and <xsl:apply-templates> because the value of @select here is an XPath expression, which is to say that it has to go track down and find the nodes it’s looking for. But the value of @match, as an XPath pattern, doesn’t have to find anything; it just has to identify, concisely, the nodes it matches. Your inclusion of the double slash at the beginning of the value of the @match attribute doesn’t affect the results, so it isn’t an error in the sense that it doesn’t cause the transformation to halt and it doesn’t give you the wrong result. But it’s nonetheless a mistake because the double-slash doesn’t contribute any meaning to the pattern, and therefore shouldn’t be there.

blueciren commented 4 years ago

It is not still valid because of <xsl:apply-templates select="count($words)"/> . According to the message, count cannot take variables..? How can I possibly put the whole sequence into count() without variables?

djbpitt commented 4 years ago

@blueciren You were still looking for speech[speaker='gol'] (a child element called <speaker>) instead of speech[@speaker='gol'] (note the @, specifying that "speaker" is an attribute, that is, on the attribute axis, and not the child axis). I changed that and it now returns a count of the speeches.

I also added an @as attribute to the variable declaration. This wasn’t the source of your problem, but it’s good practice to type your variables.

blueciren commented 4 years ago

Thank you. I pulled your file and it works! I tried on the local computer without pushing. But with @ fixed, it was then not still valid due to $words. It seems like you have not made changes except those indicated above. I have why such an error came up. Anyway, thank you so much.

djbpitt commented 4 years ago

@blueciren What isn’t valid because of $words? The XSLT? The output of the transformation? You wrote that the transformation was erroring out because $words was empty, and that’s now fixed, but it sounds from what you write above as if you had been expecting me to change something else, as well. What other changes had you wanted me to make?

blueciren commented 4 years ago

I am sorry about that my question was not concrete. What you have fixed was perfect. Yes, the transformation showed an error message because $words was empty and that is now fixed. But I wonder how it was fixed. You did not make changes in the sequences related to $words. I am asking this question, because after I corrected that mistake with the attribute of @, $words was not still valid. You did not make changes here <xsl:variable name="words" as="xs:string+" select="tokenize($speech_text, ' ')"/>, but now the xslt is valid.

djbpitt commented 4 years ago

@blueciren The value of $words depends on the value of $speech_text, and the value of $speech_text depends on the value of $speeches. This means that if there is an error in the way $speeches is defined, that error is inherited by variables that depend on $speeches. Where the error gets reported depends on where you have specified type checking using the @as attribute. Your error was early, but it was being reported only later. Here’s why:

  1. $speecheswas not retrieving anyelements (because if was looking for achild element instead of a@speakerattribute), which meant that it was an empty sequence. Since you weren’t doing type checking for the value of$speeches, no error was reported when you had a zero-length sequence (zero` elements) as that value.
  2. When you then processed $speeches to create $speech_text, that became a null string, that is, a string that contained zero characters. Unfortunately, the null string counts as a string, so no error would be reported there even with type checking.
  3. When you then tried to tokenize that to create $words, you raised an error because you had, correctly, used the @as attribute to warn you that the value of $words had to be one or more strings, and tokenizing a null string returns a null sequence. This is documented in the spec: “If $input is the empty sequence, or if $input is the zero-length string, the function returns the empty sequence.” (https://www.w3.org/TR/xpath-functions-31/#func-tokenize).

You weren’t being notified of the error where it actually first occurred, then, in the definition of $speeches, because you hadn’t used an @as attribute when you defined $speeches. Had you set that as:

as="element(speech)+"

you would have need notified of the error then. I’ve added that attribute to the code.

blueciren commented 4 years ago

Thanks!