Closed gcampo88 closed 6 years ago
@gcampo88 can you point me at specific examples, before/after, and you think the optimal "after" should be?
Yup. We noticed this on the import of The Moth. Sample episode that was affected: http://feeder.prx.org/api/v1/episodes/aea7fff3-8892-41ca-a804-a94ceb3c4688
Before the import, that item in the feed had this for
<description>
<![CDATA[In this hour, we celebrate things we do in the name of love. A woman is asked to choose between her family and the man she loves; a fisherman comes to appreciate his roots; a love song becomes a love story; and a man battles the law to protect his husband’s legacy. Hosted by The […]]]>
</description>
<content:encoded>
<![CDATA[<p>In this hour, we celebrate things we do in the name of love. A woman is asked to choose between her family and the man she loves; a fisherman comes to appreciate his roots; a love song becomes a love story; and a man battles the law to protect his husband’s legacy.</p>
<p>Hosted by The Moth’s Senior Director, Meg Bowles. The Moth Radio Hour is produced by The Moth and Jay Allison of Atlantic Public Media.</p>
<p>Storytellers: Suzie Afridi, Dan Larsen, Gabrielle Shea, and Jim Obergefell.</p>
<p>For pictures from this episode, visit <a href="https://themoth.org/radio-hour/in-the-name-of-love">The Moth.org</a></p>
<p>Sponsored by:</p>
<ul>
<li><a href="http://rocketmortgage.com/moth">www.rocketmortgage.com/Moth</a></li>
<li><a href="https://www.squarespace.com/?channel=podcast&subchannel=publicmediamarketing&source=themoth">www.squarespace.com/Moth</a></li>
<li><a href="https://www.ziprecruiter.com/moth">www.ziprecruiter.com/Moth</a></li>
</ul>
<p> </p>
]]>
</content:encoded>
same fields after import:
<description>
<![CDATA[<p>In this hour, we celebrate things we do in the name of love. A woman is asked to choose between her family and the man she loves; a fisherman comes to appreciate his roots; a love song becomes a love story; and a man battles the law to protect his husband’s legacy. Hosted by The […]</p>]]>
</description>
<content:encoded>
<![CDATA[<p>In this hour, we celebrate things we do in the name of love. A woman is asked to choose between her family and the man she loves; a fisherman comes to appreciate his roots; a love song becomes a love story; and a man battles the law to protect his husband’s legacy. Hosted by The […]</p>]]>
</content:encoded>
Looks like we're maybe overwriting w/ description and then sanitizing it? May end up being relevant: Feeder RSS generator (content:encoded is set on line 120) Feeder Episode model
Update: I think that it might just be that yes, we're overwriting the content
with the description
from CMS inside the Episode Distribution
And maybe that's okay... in Publish I'm not sure we actually allow users to specifically set the content
field.
Update: I think that we don't (<- 3 links)
So! My deduction after this: maybe we don't need to bother changing anything, since we actually don't differentiate between content
+ description
once users are on Publish. But I suppose the argument is that their archive feed post-import should look identical to their archive feed pre-import. So... toss-up.
Thoughts? Feelings? Can I send you MORE links?
I ❤️ all the links - MOAR!
It looks as if the original content:encoded
has no home in the Episode destination model. So unless we create a new column for it, there's no clean way to preserve it. I do like the idea of 1:1 preservation, but I don't know the history of the Publish data model enough to know if omitting content:encoded
was a conscious choice.
@kookster @cavis @sandikbarr ? who has institutional memory here?
@pkarman Publish doesn't use content:encoded
. It only uses descriptionMd
for description in markdown format, fields are in encode/decode of the model here https://github.com/PRX/publish.prx.org/blob/master/src/app/shared/model/story.model.ts#L130
thanks @sandikbarr
Seems like the pieces
table would need a new column. Not sure if it matters - @gcampo88 are you aware of any customers who have mentioned the missing data?
Nope! @palomao would be the first line of hearing about that though.
That's interesting. I haven't heard from users about the shortened episode descriptions. Was this just discovered with The Moth import?
I'm a bit worried on how this will effect the remainder of the radiotopia shows not on publish but using dovetail (since their imports are coming up).
no, we do not need a new column, in our publishing system we do not differentiate between the content in description and content:encoded, they differentiate in format.
That is not how it works in WP, but that is their choice.
On regular podcast import, when we import directly into publish/cms from a feed, we correctly check for if content encoded or description is available and so a better field to use.
https://github.com/PRX/cms.prx.org/blob/master/app/models/episode_import.rb#L179
def entry_description_attribute(entry)
[:content, :itunes_summary, :description, :title].find { |d| !entry[d].blank? }
end
We need similar logic on the feeder importer.
@palomao the reason there have not been other complaints is that this feeder importer has only been used by criminal and now the moth - all the others use the regular rss import.
thanks @kookster - I will go that direction
ok I have a branch ready but it depends on https://github.com/PRX/cms.prx.org/pull/409 so will wait till that is merged to create a PR
On most recent FeederImport, the content of the episode was shortened to
'[...]'
. Should be smarter --> logic in the PodcastImport model seems better.