PRX / cms.prx.org

CMS API for PRX
https://cms.prx.org
GNU Affero General Public License v3.0
4 stars 2 forks source link

Feeder import use content:encoded or description for description #400

Closed gcampo88 closed 6 years ago

gcampo88 commented 6 years ago

On most recent FeederImport, the content of the episode was shortened to '[...]'. Should be smarter --> logic in the PodcastImport model seems better.

pkarman commented 6 years ago

@gcampo88 can you point me at specific examples, before/after, and you think the optimal "after" should be?

gcampo88 commented 6 years ago

Yup. We noticed this on the import of The Moth. Sample episode that was affected: http://feeder.prx.org/api/v1/episodes/aea7fff3-8892-41ca-a804-a94ceb3c4688 Before the import, that item in the feed had this for and :

<description>
        <![CDATA[In this hour, we celebrate things we do in the name of love. A woman is asked to choose between her family and the man she loves; a fisherman comes to appreciate his roots; a love song becomes a love story; and a man battles the law to protect his husband’s legacy. Hosted by The […]]]>
</description>
<content:encoded>
        <![CDATA[<p>In this hour, we celebrate things we do in the name of love. A woman is asked to choose between her family and the man she loves; a fisherman comes to appreciate his roots; a love song becomes a love story; and a man battles the law to protect his husband’s legacy.</p>
<p>Hosted by The Moth’s Senior Director, Meg Bowles. The Moth Radio Hour is produced by The Moth and Jay Allison of Atlantic Public Media.</p>
<p>Storytellers: Suzie Afridi, Dan Larsen, Gabrielle Shea, and Jim Obergefell.</p>
<p>For pictures from this episode, visit <a href="https://themoth.org/radio-hour/in-the-name-of-love">The Moth.org</a></p>
<p>Sponsored by:</p>
<ul>
<li><a href="http://rocketmortgage.com/moth">www.rocketmortgage.com/Moth</a></li>
<li><a href="https://www.squarespace.com/?channel=podcast&amp;subchannel=publicmediamarketing&amp;source=themoth">www.squarespace.com/Moth</a></li>
<li><a href="https://www.ziprecruiter.com/moth">www.ziprecruiter.com/Moth</a></li>
</ul>
<p> </p>
]]>
</content:encoded>

same fields after import:

<description>
        <![CDATA[<p>In this hour, we celebrate things we do in the name of love. A woman is asked to choose between her family and the man she loves; a fisherman comes to appreciate his roots; a love song becomes a love story; and a man battles the law to protect his husband’s legacy. Hosted by The […]</p>]]>
</description>
<content:encoded>
        <![CDATA[<p>In this hour, we celebrate things we do in the name of love. A woman is asked to choose between her family and the man she loves; a fisherman comes to appreciate his roots; a love song becomes a love story; and a man battles the law to protect his husband’s legacy. Hosted by The […]</p>]]>
</content:encoded>

Looks like we're maybe overwriting w/ description and then sanitizing it? May end up being relevant: Feeder RSS generator (content:encoded is set on line 120) Feeder Episode model

Update: I think that it might just be that yes, we're overwriting the content with the description from CMS inside the Episode Distribution

And maybe that's okay... in Publish I'm not sure we actually allow users to specifically set the content field. Update: I think that we don't (<- 3 links)

So! My deduction after this: maybe we don't need to bother changing anything, since we actually don't differentiate between content + description once users are on Publish. But I suppose the argument is that their archive feed post-import should look identical to their archive feed pre-import. So... toss-up.

Thoughts? Feelings? Can I send you MORE links?

pkarman commented 6 years ago

I ❤️ all the links - MOAR!

It looks as if the original content:encoded has no home in the Episode destination model. So unless we create a new column for it, there's no clean way to preserve it. I do like the idea of 1:1 preservation, but I don't know the history of the Publish data model enough to know if omitting content:encoded was a conscious choice.

@kookster @cavis @sandikbarr ? who has institutional memory here?

sandikbarr commented 6 years ago

@pkarman Publish doesn't use content:encoded. It only uses descriptionMd for description in markdown format, fields are in encode/decode of the model here https://github.com/PRX/publish.prx.org/blob/master/src/app/shared/model/story.model.ts#L130

pkarman commented 6 years ago

thanks @sandikbarr Seems like the pieces table would need a new column. Not sure if it matters - @gcampo88 are you aware of any customers who have mentioned the missing data?

gcampo88 commented 6 years ago

Nope! @palomao would be the first line of hearing about that though.

palomao commented 6 years ago

That's interesting. I haven't heard from users about the shortened episode descriptions. Was this just discovered with The Moth import?

I'm a bit worried on how this will effect the remainder of the radiotopia shows not on publish but using dovetail (since their imports are coming up).

kookster commented 6 years ago

no, we do not need a new column, in our publishing system we do not differentiate between the content in description and content:encoded, they differentiate in format.

That is not how it works in WP, but that is their choice.

On regular podcast import, when we import directly into publish/cms from a feed, we correctly check for if content encoded or description is available and so a better field to use.

https://github.com/PRX/cms.prx.org/blob/master/app/models/episode_import.rb#L179

  def entry_description_attribute(entry)
    [:content, :itunes_summary, :description, :title].find { |d| !entry[d].blank? }
  end

We need similar logic on the feeder importer.

@palomao the reason there have not been other complaints is that this feeder importer has only been used by criminal and now the moth - all the others use the regular rss import.

pkarman commented 6 years ago

thanks @kookster - I will go that direction

pkarman commented 6 years ago

ok I have a branch ready but it depends on https://github.com/PRX/cms.prx.org/pull/409 so will wait till that is merged to create a PR