DocNow / diffengine

track changes to the news, where news is anything with an RSS feed
MIT License
177 stars 30 forks source link

Configurable tweet text #74

Closed nahuelhds closed 4 years ago

nahuelhds commented 4 years ago

Now, the tweets can differentiate what changed between both versions: the URL, the title and/or the summary.

For accomplish that, a new config is added

lang:
  change_in: "Change in"
  the_url: "the URL"
  the_title: "the title"
  and: "and"
  the_summary: "the summary"

Only if all the keys are defined in the lang dict inside the config.yaml then the tweet will include what's changed followed by the diff.url. The following are examples for the texts -in spanish with the old url, this PR will includes the new diff url-

Also applies for URL changes, but I didn't see any event in real life yet.

nahuelhds commented 4 years ago

@edsu this is ready to be reviewed/merged.

edsu commented 4 years ago

Sorry for the delay on this @nahuelhds -- this all looks good but I was wondering if by the_summary you actually mean the content of the page? Also I'm not quite sure how we would track changes to the URL since right now there is only one URL per set of diffs. All the EntryVersions belong to an Entry, that has one URL.

nahuelhds commented 4 years ago

I assumed that the diff was done over the content given by the RSS feed. I think it's not all the content inside the RSS, right? If that's not the case, then maybe it actually is the content and not just the summary of it.

You know better than me here. Otherwise, I should check in deep later!

edsu commented 4 years ago

It is understandable that there is confusion about summary because the EntryVersion class uses a property named summary. The value of summary is actually not the text from the RSS feed, but text from the page itself, after the boilerplate of the page has been removed with Readability and had extraneous HTML removed with bleach. You can see the process here https://github.com/DocNow/diffengine/blob/master/diffengine/__init__.py#L166-L170

All that being said, maybe it's best to stick with summary since that is what Readability calls it, and its what is in the model. But I think the default string value should be "the page"?

nahuelhds commented 4 years ago

Nice. I'll create an issue for that one!