PipedreamHQ / pipedream

Connect APIs, remarkably fast. Free for developers.
https://pipedream.com
Other
8.32k stars 5.27k forks source link

[BUG] RSS trigger not picking up recent items #2910

Closed dannyroosevelt closed 1 year ago

dannyroosevelt commented 1 year ago

New items are not getting picked up in the RSS trigger.

Workflow (shared w/ Support): https://pipedream.com/@samirtalwar/tweet-blog-posts-p_PACJno3/edit

RSS feed: https://monospacedmonologues.com/index.xml

ctrlaltdylan commented 1 year ago

The source with the reported issue is using the rss_new-item-in-feed v0.0.1, but I believe the newest version 0.0.2 might have fixed the bug.

Advising the customer to recreate the source.

I was unable to recreate the bug in a v0.0.2 version of the component.

ctrlaltdylan commented 1 year ago

Adding in example problem workflow.

I believe the problem may be a misordering of events, not that the RSS feed isn't emitting new events:

CleanShot 2022-05-23 at 12 05 38@2x

CleanShot 2022-05-23 at 12 05 22@2x

The latest post in this RSS feed is May 19th, but the latest event emitted is corresponding with a post from May 10th

ctrlaltdylan commented 1 year ago

It might be that the feedparser reading this XML feed is ordering in alphabetical order on the guid property

The guid's in chronological order:

But the sources event logs show that the How to drive fast article is first, which is hinting to me that the guid's are sorted into an alphanumeric order, which is causing the issue.

ctrlaltdylan commented 1 year ago

@alysonturing does this make sense?

I'm not as familiar with the feedparser module, is it possible that it's using the guid property as a primary ID and sorting it before parsing each item to the the readable function?

SamirTalwar commented 1 year ago

I'm not sure I follow why the order would matter; it seems to be using the "unique" dedup strategy. Is there something that would deduplicate older values regardless?

dylburger commented 1 year ago

@SamirTalwar I just published a new version of the RSS source. Could you do me a favor and try to create a new source at https://pipedream.com/new/sources ?

If that doesn't work, can you visit the Logs tab of the source and share any logs that appear there?

alysonturing commented 1 year ago

@alysonturing does this make sense?

I'm not as familiar with the feedparser module, is it possible that it's using the guid property as a primary ID and sorting it before parsing each item to the the readable function?

Hey sorry, I just saw your question, I believe that the guid is not generated by the feedparser so we can't say for sure which is the method that generates it

SamirTalwar commented 1 year ago

I just deleted and recreated the source. The events still seem to be in the wrong order.

The latest event just went out successfully but I expect that's because its GUID now contains "2022/06", not "2022/05", which suggests your theory about sorting by GUID holds water.

The logs are as follows:

2022-06-02T18:17:15 End
2022-06-02T18:17:12 Start

{
  "timestamp": 1654186631,
  "timezone_utc": {
    "date": {
      "day": 2,
      "month": 6,
      "year": 2022
    },
    "iso8601": {
      "date": "2022-06-02",
      "time": "16:17:11+00:00",
      "timestamp": "2022-06-02T16:17:11+00:00"
    },
    "metadata": {
      "day_name": "Thursday",
      "day_of_week": 4,
      "start_of_week": "2022-05-30"
    },
    "pretty": {
      "date": "Jun  2, 2022",
      "time": "4:06:11 PM",
      "time_24h": "16:17:11"
    },
    "time": {
      "hour": 16,
      "millisecond": 841,
      "minute": 17,
      "second": 11
    },
    "timezone": "UTC"
  },
  "timezone_configured": {
    "date": {
      "day": 2,
      "month": 6,
      "year": 2022
    },
    "iso8601": {
      "date": "2022-06-02",
      "time": "16:17:11+00:00",
      "timestamp": "2022-06-02T16:17:11+00:00"
    },
    "metadata": {
      "day_name": "Thursday",
      "day_of_week": 4,
      "start_of_week": "2022-05-30"
    },
    "pretty": {
      "date": "Jun  2, 2022",
      "time": "4:06:11 PM",
      "time_24h": "16:17:11"
    },
    "time": {
      "hour": 16,
      "millisecond": 841,
      "minute": 17,
      "second": 11
    },
    "timezone": "UTC"
  },
  "interval_seconds": 900
}

2022-06-02T18:17:08 activate
dylburger commented 1 year ago

@SamirTalwar thanks for the detail. @alysonturing is going to look into it!

lcaresia commented 1 year ago

@dannyroosevelt

This issue and this pull are the same thing, but both are in the columns, can you check this to remove one?

dannyroosevelt commented 1 year ago

@dannyroosevelt

This issue and this pull are the same thing, but both are in the columns, can you check this to remove one?

You have better context on those 2 issues, feel free to remove one that we don't need.

dannyroosevelt commented 1 year ago

This is ready for release!

SamirTalwar commented 1 year ago

Is this going to be magically fixed for all users of the RSS source, or do we need to do something?

alysonturing commented 1 year ago

From what I know, The user need to recreate the source to update it to the new version

dylburger commented 1 year ago

@SamirTalwar That’s correct, try adding a new RSS source and let us know if that works!

SamirTalwar commented 1 year ago

Looks like it's still broken on my end. It worked last week but not this one.

dylburger commented 1 year ago

@SamirTalwar is it still emitting events in the wrong order, or are you seeing some different behavior? Do you have example logs / items that aren't emitted correctly?

And just to confirm, is it still on this feed?

ghost commented 1 year ago

@SamirTalwar have you tried creating new source? tried with different browsers like Safari, Chrome, Firefox? can you please try that and let us know if you still see the error and share logs with us and some more informative stuff would be very helpful

SamirTalwar commented 1 year ago

I have tried deleting and recreating the source, both on 9th June and just now. However, it clearly hasn't worked; I am not seeing the changes to rss.app.ts from #3084. Are these changes actually deployed?

Recreating the source has made the latest item show up, but it seems to be fairly random.

I appreciate that you folks need some input but don't you have access to the logs already? The feed and the Pipedream workflow are still the same.

SamirTalwar commented 1 year ago

(I haven't tried with different browsers but I cannot see how this would be relevant to getting the correct version of the RSS source.)

dylburger commented 1 year ago

@SamirTalwar It wasn't clear to me that this was on the same workflow. We deal with many support issues and are constantly jumping in and out of context, so we may not get it right every time. I appreciate the patience as we investigate!

The new source indeed was not published. We're re-publishing now and working on a better way to catch these cases in the future! I'll test once it's out, and you can give it a try then.

alysonturing commented 1 year ago

Just opened a new PR adding support for JSON Feed URLS, and also fixing the sorting on RSS sources, it should help the issues pointed here. https://github.com/PipedreamHQ/pipedream/pull/3192