amberpeddicord / amber-s_repo

A repository for my personal work and projects!
0 stars 0 forks source link

XQuery 1 Question #2

Open amberpeddicord opened 4 years ago

amberpeddicord commented 4 years ago

@ebeshero

I'm up to question 6 on XQuery Exercise 1, which is asking me to find the titles of the plays that contain over 58 distinct speakers. It says in the assignment that I should only be finding 3, but everything I'm trying is giving me a list of 10. I have this right now:

xquery version "3.1";
declare default element namespace "http://www.tei-c.org/ns/1.0";

let $coll :=collection('/db/apps/shakespeare/data')
let $title :=$coll//TEI//titleStmt/title
let $speaker :=$coll//TEI//sp//@who
let $distinct-values :=distinct-values($speaker)
let $count :=count($distinct-values)
where $count gt 58
return $title 

Is there something that I'm doing wrong?

amberpeddicord commented 4 years ago

@ebeshero These are the results I'm getting:

1
<title xmlns="http://www.tei-c.org/ns/1.0">Love's Labour's Lost</title>
2
<title xmlns="http://www.tei-c.org/ns/1.0">Macbeth</title>
3
<title xmlns="http://www.tei-c.org/ns/1.0">A Lover's Complaint</title>
4
<title xmlns="http://www.tei-c.org/ns/1.0">Pericles, Prince of Tyre</title>
5
<title xmlns="http://www.tei-c.org/ns/1.0">Cymbeline</title>
6
<title xmlns="http://www.tei-c.org/ns/1.0">Romeo and Juliet</title>
7
<title xmlns="http://www.tei-c.org/ns/1.0">All's Well That Ends Well</title>
8
<title xmlns="http://www.tei-c.org/ns/1.0">The Merchant of Venice</title>
9
<title xmlns="http://www.tei-c.org/ns/1.0">Coriolanus</title>
10
<title xmlns="http://www.tei-c.org/ns/1.0">The Second Part of King Henry the Sixth</title>
ebeshero commented 4 years ago

@amberpeddicord I think the issue is that you are still returning all the titles, not just the titles of plays that have more than 58 distinct speakers. You need to define a new variable for that...

amberpeddicord commented 4 years ago

@ebeshero I think I'm still confused. I thought that putting where $count gt 58 would give me only titles of plays that have more than 58 distinct speakers.

ebeshero commented 4 years ago

Ah! Thanks for this rephrase of the issue, because, yes--I see the problem now. It really has to do with what you're counting! Look at your $speaker variable. You are returning all of the values of @who for the entire collection! (Try backing up and returning just that, and you'll see that.)

Here's the issue: The count will always be more than 58 for the entire collection. What you need is a new variable that causes you to stop on each play, one at a time, and as you do that, take the count. This is where the F in FLWOR becomes active: you need a "for loop" to visit each play one at a time. You may want to define a variable that just returns all of the plays first of all (the whole TEI element), and call it $plays. Then set up your for loop:

for $p in $plays

And then you'll be able to step through each one individually! Sorry I was too foggy to notice that earlier!

amberpeddicord commented 4 years ago

@ebeshero Thank you! And one more question (I hope): Do I have to define the $p variable? I'm still a little shaky on using for loops after working through the introduction, so I'm not sure what to define that as.

ebeshero commented 4 years ago

@amberpeddicord Good question, and the answer is both "yes" and "no". Okay, what I mean is, you define the variable by saying for $p in $plays When you do that, you've made what we call a "range variable" that stops on each individual member of a sequence.

amberpeddicord commented 4 years ago

@ebeshero Sorry, I have another question! I have the following code so far:

let $coll :=collection('/db/apps/shakespeare/data')
let $speaker :=$coll//TEI//sp//@who
let $title :=$coll//TEI//titleStmt/title
let $plays :=$coll//TEI
let $distinct-values :=distinct-values($speaker)
let $count :=count($distinct-values)
for $p in $plays
where $count gt 58
return $title

and I'm getting the same results that I was before. I tried using return $p/$title and I got 0 results. I'm still having trouble getting this bit of the XQuery assignment!

ebeshero commented 4 years ago

@amberpeddicord Here's the problem now: Your where statement is still evaluating based on a count of distinct values for the entire collection. It needs to be reconsidered in terms of each $p.

You made a variable that returns all the speakers in all the plays ($speaker). And you took distinct-values of all those speakers in $distinct-values. That's great, but doesn't help when you need to evaluate each play one at a time.

So what you need to do is redo those variables so they are inside your for loop and are getting the distinct values of speakers inside each $p.

ebeshero commented 4 years ago

@amberpeddicord ...something like:

for $p in $plays
let $pSpeakers := $p//sp/@who

and go on from there to get distinct values and take the count and evaluate the counts...

amberpeddicord commented 4 years ago

@ebeshero I finally got it! Thank you so much!

ebeshero commented 4 years ago

@amberpeddicord Yay! For-loops are one of the hardest concepts of XQuery, so getting through this first one is a major milestone!

amberpeddicord commented 4 years ago

@ebeshero I'm onto XQuery 2 and I have a small question! How do I use tokenize to get the last item in a series? I'm tokenizing the filepaths on the '/' and I have this: collection('/db/pokemonMap/pokemon')//tokenize(base-uri(pokemon), '/') but I'm not sure where to put the last() function. Everything I try has given me an error.

ebeshero commented 4 years ago

@amberpeddicord So, remember that the result of tokenize() is a sequence or series of results. You can return something in a particular position in that sequence of results by invoking it by number. For example, if you wanted the second result in the tokenized sequence, you'd use a predicate like this [2]. If you wanted the first, you'd use this predicate: [1]. And if you wanted the last one, without knowing its number, there's a function for that that you probably remember...

amberpeddicord commented 4 years ago

@ebeshero Another question! I'm on XQuery 2 that uses the Pokemon files, and I successfully got it to give me all of the Pokemon types. The issue I'm having is with trying to tokenize the results on the space to get the individual values. I set it up like this:

let $types :=collection('/db/pokemonMap/pokemon')//typing/@type/string()
return tokenize($types, '\s')

and this is only returning and tokenizing the first type element in the collection (so I'm only getting 2 results). I'm sure this is an easy fix, but I can't figure out what I'm doing wrong...again.

ebeshero commented 4 years ago

@amberpeddicord This has to do with the kind of function that tokenize() is. It is like name() because it can only operate on one node at a time. And it’s NOT like count() or distinct-values() that calculate something from a sequence of values.

So what can you do? This is where you want to do that for loop that lets you address each member of a sequence just one at a time.

amberpeddicord commented 4 years ago

@ebeshero Thank you!

amberpeddicord commented 4 years ago

@ebeshero Sorry that I'm asking so many questions about XQuery! But, I have another...

I'm trying to remove the duplicates after tokenizing the Pokemon types. I tried creating variables to tokenize and to put distinct-values() over what I tokenized, and I tried doing the opposite (along with several other experiments...). I have this right now:

let $coll := collection('/db/pokemonMap/pokemon')
let $pokemon :=$coll//pokemon
for $p in $pokemon
let $types :=$p//typing/string(@type)
let $tokenize :=tokenize($types, '\s')
let $distinct := distinct-values($tokenize)
return $distinct

and I'm still getting duplicates. Am I not able to use tokenize() and distinct-values() at the same time? (Again, sorry if this is something I should know, I've just been stuck on this for days!)

ebeshero commented 4 years ago

@amberpeddicord Your XQuery is, believe it or not, really applying distinct-values()! What's happening is that it's applying distinct-values() to each file in the Pokemon collection. Test that by changing your return statement a little to see what's happening: let's return the file path of each $p with this:

return ($p/base-uri(), $distinct)

This way you will see that for each file, you're getting the distinct-values of the types just in that file. That's what happens in a for loop situation.

Of course, as you point out, this is not really want we want to see if we want a list of just the distinct values of all the @type attributes tokenized on white spaces in the whole collection.

So, what to do? Don't break into each file to do this, but first of all just get all the @type values across the whole collection, and I would add the tokenize function at the end of that expression. Why? Because you then need to take distinct-values() over a long sequence.

Think about what's happening--I know the issue is when to deal with things one-by-one with a for loop, and when not to use a for loop. You actually want a long sequence of type values to be stored in a variable--you want a plurality there, because you need distinct-values to remove its duplicates.

That variable too (your $distinct variable) will be a variable that holds a plurality of distinct values. You will want to work with that in a for loop to process each value one by one and look up something about it.

@frabbitry is gearing up to teach this very thing in February, so I'm pinging her here to give her some practice ! :-)

ebeshero commented 4 years ago

@amberpeddicord So, to simplify my advice here: Don't use the for loop you have in place: for $p in $pokemon. Remove that, and look for type values in $pokemon as a whole.

Also, your $tokenize variable is only sort of partially working--it is currently just returning the very first value it finds, because it can't handle a whole sequence. The answer to this is not a for loop, though, but repositioning the function, so it sits at the end of the XPath expression defining $types. That will need a little reconfiguration. Step down into @type, and then use simple map ! to initiate the tokenize() function on the string like this:

let $types := $pokemon//typing/@type ! tokenize(string(), '\s')

After that, take distinct-values() and I think you'll like those results better.

ebeshero commented 4 years ago

Take some time to think about the differences between working in a for loop, and not working in a for loop as you're doing this. We don't want a separate for loop inside the @type values because if we work on them one by one, we can't take distinct-values of the whole array of all the types.

But when we do want a for loop is when we need to stop on each value by itself alone, and look up something about just that. That will be the next step, as you work with each member of the list of distinct-values in turn to go find out something about it.