Podcastindex-org / podcast-namespace

A wholistic rss namespace for podcasting
Creative Commons Zero v1.0 Universal
388 stars 116 forks source link

<podcast:host> disambiguation #8

Closed jamescridland closed 4 years ago

jamescridland commented 4 years ago
[A person's name]

In the UK, there's a very famous radio presenter (who does a podcast) called Chris Evans. In the US, there's a very famous actor (who does a podcast) called Chris Evans.

These are not the same person.

The schema entry for a person is complicated, but allows for additional information like images, links to Wikipedia or Twitter (which are both unique IDs of a sort), and other elements.

Can/should a be a collection of different attributes about a person? It seems that just using a name might be a little less good here. URLs and images, perhaps?

Secondly - this could well be a spammy entry, and we might need to consider how we stop every podcast claiming that they have Joe Rogan as guests every week.

saerdnaer commented 4 years ago

Podlove uses the existing <atom:contributor> for this, which uses atom:uri as primary identifier, e.g. the URL to the persons private home page, or public social media profile. I proposed to extend this by role and additional IDs c.f. https://github.com/podlove/podlove-specifications/issues/23

You could also minify the overhead by defining the host only on the channel, and guests only on the item level. Podchaser also had the idea to allow active_from and active_to date fields, so you could specify cast members for a certain time span (e.g. the first three years, or the first season or simular) without the need to add them to each item.

dellagustin commented 4 years ago

I have written a spec for a namespace called socialrss (https://github.com/socialrss), it never took off, but may give some inspiration.

There I do not distinguish between host and guest, I just define participants.

The participant can be on channel or item, and can be permanent or not (the permant field only makes sense on the channel - mainly to be used for distinguishing between fixed members of the podcasts and guests).

I also define a participantReference, that can be used on items to refer to participants defined on the channel. The main reason for this is that participants can also have contact information and social media handles, and this is information one does not want to keep repeating in ever item.

daveajones commented 4 years ago

These are good comments, but someone give me an example implementation. This was posted at podcastindex.social:

<podcast:host>
  <name>...</name>
  <bio>...</bio>
  <img src="https://..." alt="mugshot" />
  <a href="https://...">blog</a>
  ...
</podcast:host>

I like this layout. How could this be synthesized into the other suggestions here. Maybe something like:

<podcast:participant type="[host|guest]" href="[link to bio/blog/etc]" img="[link to image]">[name of person]</podcast:participant>

daveajones commented 4 years ago

I don't think I like the generic element names with attribute differentiators. I like this better:

[name of person] [name of person] XML is written for aggregators and humans. This makes it easier to read for both.
tomrossi7 commented 4 years ago

I really like his idea!

<podcast:host href="[link to bio/blog/etc]" img="[link to image]">[name of person]</podcast:host>
<podcast:guest href="[link to bio/blog/etc]" img="[link to image]">[name of person]</podcast:guest>
daveajones commented 4 years ago

Let's move forward with it as now implemented then. This feels solid.

bslinger commented 4 years ago

Disambiguation and moderation are the two main reasons we (Podchaser) abandoned the idea of adding creator elements to RSS (the link that @saerdnaer referenced above) and moved to a direct API submission to have more control.

The other concern here is roles other than host/guest - if we're starting to credit people in RSS feeds, how do we credit producers, editors, voice actors, etc. We have a working group underway to try to define an exhaustive useful list of the possible roles which we can share here when complete if that's useful.

I think it also makes sense to allow for sub elements to help define the specific creator, similar to what is proposed for services at the channel level, that could link to external sites such as Podchaser or IMDb where there is a verified profile in place. Social media for the specific creator could also be useful.

jamescridland commented 4 years ago

<podcast:host href="[link to bio/blog/etc]" img="[link to image]">[name of person]</podcast:host>

So...

<podcast:host href="https://twitter.com/achrisevans" img="https://pbs.twimg.com/profile_images/654707630867222528/FKr6j8eF_400x400.jpg">Chris Evans</podcast:host>

<podcast:host href="https://twitter.com/chrisevans" img="https://www.onthisday.com/images/people/chris-evans-medium.jpg">Chris Evans</podcast:host>

<podcast:host href="https://en.wikipedia.org/wiki/Chris_Evans_(presenter)" img="https://i2-prod.walesonline.co.uk/incoming/article15103570.ece/ALTERNATES/s615b/0_BBC-annual-report.jpg">Chris Evans</podcast:host>

Programmatically, how are you going to know that two of these Chris Evans's are the same person, and one is different?

daveajones commented 4 years ago

People have also expressed concern about fraudulent listing of guests. I’m not that concerned about it. Fraud is forever and always with us. And we’ll learn how to spot it. But it’s yet another discussion point.

Should we boot this to phase 2 and wait for Podchaser to finish their work on this? Phase 1 needs to be low hanging fruit. The easy stuff.

@bslinger what do you think your timeline is?

MartinMouritzen commented 4 years ago

I'm not that concerned about fraudulent listing of guests either, we can rely on some kind of reporting for that in the future, and then RSS feeds can simply be voted "untrustworthy".

I am a bit reluctant to boot this to Phase 2, it's something highly visible and something that speaks to the ego of hosts and guests, which makes it something that could drive adoption of the whole namespace.

I really like the podcast:person entries in the sandbox feed, and I'm implementing it right now - the only problem is uniqueness, and either 1) That's not solvable, and then we might as well go with what we have. 2) It's solvable but involves a new attribute, in which case we can add that in Phase 2.

bslinger commented 4 years ago

@daveajones, regarding fraudulent credits: we have an internal moderation team who approve/reject all credits submitted to Podchaser - the current plan for creators/credits in RSS feeds would be to feed them through the same moderation system to ensure we are staying accurate whilst allowing us to take advantage of the added metadata.

Regarding disambiguation, we will be well placed to provide a service here for those consuming RSS feeds and wanting to disambiguate creators - we're working on a new API which will have free tiers and we can allow searching for a creator based on a specific social link which would help in the case @jamescridland laid out above. There is also the option of adding a Podchaser ID (PCID) or URL to the spec if the hosting company wants to identify a specific creator. We are working with a number of hosting services who will allow searching for creators via our API which would make this very easy for podcast creators to utilise. (Omny Studio is already live with this)

In the spirit of being an open standard it obviously wouldn't be required, but our goal has always been to provide a central, verified location for creators and credits so our API could be a useful way for RSS consumers to be confident in the credits they see in a feed.

In regards to the role taxonomy, I believe we're close to a final release but will check with Cole who is heading up that project.

jamescridland commented 4 years ago

Given I'm involved in the taxonomy discussion too, could I suggest that we use <podcast:participant> with a mandatory type? That could be simply "host" and "guest" for now. That would enable us to move forward with keeping this in phase 1, but allow us to expand it where required. (It's likely to be of the format "category / thing", like "management/ceo" but not that).

I like the idea of some sort of ID. In the meantime, we might wish to suggest that Podchaser URLs are used for URLs, at least in the example?

To have podcast:host seems to be knowingly building something that won't scale, if you ask me. However, if the only thing we will ever list is a host and guest, and not a writer/producer/musician then that's OK - just seems a little limiting.

bslinger commented 4 years ago

@jamescridland , looks like is the current implementation since https://github.com/Podcastindex-org/podcast-namespace/pull/29 which will scale well enough. The only unresolved case I can think of is the same person in multiple roles, but just having multiple elements with the same person and each role is probably fine there.

@daveajones , the taxonomy project should be wrapping up in a month or two, so by the time our API is available we'll have a firm list of relevant roles for creators.

theDanielJLewis commented 4 years ago

I think we could have three potential URL sources approved for people:

  1. Podchaser
  2. IMDB
  3. Wikipedia
daveajones commented 4 years ago

All of this sounds good to me. Podchaser can set this standard here. And, then we recommend falling back to IMDB, Wikipedia as @theDanielJLewis mentions. If Podchaser is doing this work it just seems right to let them lead. And, we can keep this in the initial spec that way. I too feel that it's important. Maybe not as simple as the others, but still important.

@MartinMouritzen is already baking it in, so we can see some real world there.

@bslinger Can you link up with us when your API goes live? We'll drop it in the docs here.

@jamescridland You're thinking rename from "person" to "participant"? I'll let you do the PR on that so I don't mess up your meaning on what the attributes should look like.

tomrossi7 commented 4 years ago

Personally, I prefer the general <podcast:person> tag over the more specific-sounding <podcast:participant>. "Person" seems to lend itself more for future tags like "producer".

@bslinger do you think there is a way for to include a verified URL that could be followed to determine whether or not the person is "verified" or not? For example, <podcast:person verifiedUrl="http://3rdpartyverification.com/URL" ...> When the verified URL is called you could pass in the RSS feed to the verification company and it could return whether or not its verified? In order for this to work the person would need to use a verification service and then provide their unique URL to any podcast they appear on. I dunno. Just trying to think through a platform for a company to add value to the market by providing a 3rd party verification service that could be embedded directly in the feed through a simple URL call.

bslinger commented 4 years ago

@tomrossi7 I'm not 100% on what you're proposing, can you elaborate on what you're thinking in regards to verification there? Do you mean a service Podchaser (or somebody) could provide to confirm that a particular person is verified to have been on a podcast, via the RSS feed? Or am I misunderstanding?

saerdnaer commented 4 years ago

I think we could have three potential URL sources approved for people:

  1. Podchaser
  2. IMDB
  3. Wikipedia

I would also add Wikidata, Twitter and Mastodon instances.

theDanielJLewis commented 4 years ago

I think Twitter and Mastodon don't have enough authority to verify a person, unless they're a verified account. But then how do you enforce that?

MartinMouritzen commented 4 years ago

I oppose strongly towards using Podchaser, IMDB, Twitter or other closed source, for-profit organizations as an "approved people" source.

For me it seems to be the same as using Podfriend as an authority (because I also plan to launch sections about people etc.), which I think would be just as wrong in general.

That is, unless we come up with a format, data-sharing and API structure that would enable anyone to become an "authority". But I guess this kind of problem is not really for the podcastindex to solve, this sounds almost like a "peopleindex" :D

I think for now the current solution, maybe adding an optional "uid" that can be whatever, but enables people in their feeds to signal that this is a specific "Dave Jones". Does that approach have problems? Sure, but I think it's manageable and open, whereas using a third party service as validator would basically lock everyone in to them, and creating a "peopleindex" would probably be too big a task no one wants to tackle (and align companies on) right now.

jamescridland commented 4 years ago

Twitter and Mastodon don't have enough authority to verify a person

Let's be cautious. Nobody is trying to "verify a person" here - this is a disambiguation issue, isn't it?

If Chris Evans is on a podcast, do you mean Chris Evans or Chris Evans? That's the problem we're trying to fix, I think? We ideally need a UID for each person - like a Wikipedia URL or Wikidata ID.

Wikipedia isn't a bad shout here - but Wikipedia and Wikidata have the issue of a requirement of notability to get in. Adam Curry might be notable enough; Evo Terra might be too; but you and I aren't.

So we're left with a problem. Either we:

a) Use Wikipedia IDs, and thus only accept "notable" people for this sort of service - but why should Wikipedia rules mean I can't find out which podcasts have had Daniel J Lewis on recently?

b) allow a variety of URLs to say "this is the Dave Jones I mean" - but that doesn't actually help with our disambiguation issue at all. Although, Dave Jones, I love your YouTube videos.

c) allow a reciprocal link structure, like Google Podcasts uses, where you link a website to an RSS feed and back to prove ownership. But this is deeply techie and is unlikely to work well - and how do you know the verified link?

d) Require Facebook/Twitter/LinkedIn and some form of profile on PodcastIndex, which seems unworkable, at least for now.

Podchaser has a large amount of people in it; but, as you say, it's a closed and for-profit system. I don't begrudge anyone from making a profit, though. Perhaps this is a licensing issue with Podchaser, where some portion of their "people" database is offered as a free licence? Does that help?

MartinMouritzen commented 4 years ago

I pretty much agree with all your other points James, but:

Podchaser has a large amount of people in it; but, as you say, it's a closed and for-profit system. I don't begrudge anyone from making a profit, though. Perhaps this is a licensing issue with Podchaser, where some portion of their "people" database is offered as a free licence? Does that help?

I don't mind Podchaser making a profit at all. However, if we make it a requirement that we have to say to all podcasters "You have to go to Podchaser" to find the profile of the person you want to include as host/guest, then we basically made a standard locking ourselves in with a third party.

  1. That's a big problem for me, because they are basically competing with some of the things I want to introduce myself for my own service.
  2. Imagine that in the HTML spec it said "To play audio please find the UID on sound.microsoft.com". I don't think that's the kind of tie-in we want in the namespace.
  3. You can almost argue that this is one of the things we're trying to actively get out of, where apple/itunes is almost the authority on podcasts themselves.

I fully appreciate that Podchaser, right now, has the most data about hosts and guests, but they are also much more than that, and are in competition with a lot of the app developers that are using podcast index.

I would love to work towards a platform-agnostic approach. Maybe a format where several services can become a "people directory" and share information about hosts/guests internally.

But until then, I would opt for a simple optional "UID" attribute.

Let's imagine I have a podcast and interview you James. Then I put <podcast:person role="guest" img="x.jpg" href="https://www.podnews.net">James Cridland</podcast:person>

in my feed.

Then it turns out that someone else is named James Cridland, and you reach out to me and ask me to put in uid="james.cridland.net" (or whatever you decided represents yourself)

That would basically be enough, and work from day 1.

This also means that we could still have directories, where people could register themselves and podcast platforms could call their endpoints to search - we just wouldn't point to any specific one in the namespace standard.

jamescridland commented 4 years ago

Understood regarding Podchaser.

A little unsure how the UID helps. All of these are me - https://james.cridland.net https://podnews.net https://james.crid.land https://facebook.com/jamescridland https://linkedin.com/jamescridland https://twitter.com/jamescridland - but the difficulty is that Chris Evans isn't going to check the UID in an RSS feed. And nor's Chris Evans. So a search for Chris Evans will find Chris Evans, when we'd like it to find Chris Evans, or even Chris Evans.

I think, though, this is unsolvable. And at the very least, a search will find Chris Evans along with Chris Evans and Chris Evans, which should, at least, help the listener more than not being able to search at all.

MartinMouritzen commented 4 years ago

@jamescridland I guess the UID bit for me would only help solve that people could still link to any of those links, but still identify it as you with the UID.

Chris Evans is not going to check an RSS feed, but he is known enough that other people might point out that they should modify it to point to the real one.

But I agree it might not be solvable on a namespace level, and that might be perfectly fine for now.

I know the way I probably will solve this problem is to have some way in Podfriend to combine href links in the element to still point to the same person. Of course that requires some kind of editorial process, but that's probably fine in any case.

jamescridland commented 4 years ago

If one person links to his wikipedia, and the next person links to his twitter account, thankfully there's a link from the wikipedia page to the twitter account.

bslinger commented 4 years ago

@MartinMouritzen , I totally agree this should continue to be an open standard, and understand the apprehension with tying it in with Podchaser - that's not something I would suggest either.

Having said that, as I mentioned above Podchaser will be able to provide some services that RSS consumers could choose to use to 1) verify the accuracy of credits present in a feed, and 2) disambiguate a particular name, depending on what additional information is given (social links, wiki links, or optionally the specific Podchaser ID we assign to creators).

It would be up to both the party generating the RSS feed to decide what to include, and the party consuming the RSS feed to decide what third party services they might want to use to ensure a certain level of accuracy or to not use any services and just trust the source RSS and use their own techniques for disambiguation.

snookfin commented 4 years ago

I propose we keep this really simple. Let's provide podcasters a way to provide a title, name, image, and link. From there we can let the app devs decide what they want to do with it. We can put some verbiage in the spec that the intended use is to provide more information about the people appearing in the episode. Unless an app provides a motive, by attempting to cross-link episodes featuring the same people, or giving more weight to name searches by prioritizing this element, I don't think this will increase abuse.

At the end of the day, podcast creators want to see their photo (and guests photos) within the apps while episodes are being played. I think that's the use-case we need to design for. Closed platforms like Apple already have this. We need an open spec that offers something similar. Nothing more and nothing less.

Personally, I would link to my Podchaser profile, but I love the flexibility to change that at any time for any reason.

daveajones commented 4 years ago

@snookfin "Closed platforms like Apple already have this. We need an open spec that offers something similar. Nothing more and nothing less." <--- 100% this.

It's never the wrong decision to begin with something simple. Complexity can always be layered on in a future revision. But, you can never simplify it after the fact.

daveajones commented 4 years ago

Should this be a self-closing tag with "name" as an attribute instead of as the node value?

daveajones commented 4 years ago

Going to close this thread and open a new one for "person" so we can talk about finalizing.