ProjectOpenSea / stream-js

A TypeScript SDK to receive pushed updates from OpenSea over websocket.
https://docs.opensea.io/reference/stream-api-overview
MIT License
166 stars 52 forks source link

Missing events #99

Closed shadmau closed 2 years ago

shadmau commented 2 years ago

I have tested the stream with some collections. Unfortunately, a lot of events are missing. In my test ~60% less listing events are streamed compared to os activity dashboard of the collection

ludovitkapusta commented 2 years ago

i was wondering if something like this maybe be real, so im really happy that someone tested this.

can be this confirmed by the devs?

QuitCrypto commented 2 years ago

Confirming this issue. I am also receiving something like ~50% of events when connected to a stream. Seems consistent whether I am subscribed to a single collection or several.

rjbks commented 2 years ago

@fmaduakor @QuitCrypto

How did you measure this?

QuitCrypto commented 2 years ago

I opened a stream for BAYC and logged all events received over a 30 min period, then compared the output with the listing activity tab on OS.

tino-web commented 2 years ago

Having the same issue here. Only half of the listing events are streamed...

shea851 commented 2 years ago

Same issue here. Missing a lot of events. Seems like a pretty big issue.

Would a project dev please acknowledge the issue? Thank you.

rjbks commented 2 years ago

@fmaduakor @QuitCrypto @shea851 @tino-web It would be helpful if we had a reproducible example. Can you share the code you used to show missing events?

rohfle commented 2 years ago

reproducible example https://github.com/rohfle/osstream-issue-99

Example output:

2022-05-16T03:53:38.449Z 2 stream only 3 api only 0 both
STREAM MISSING FROM API
  2022-05-16T03:53:31.518421 https://opensea.io/assets/0x495f947276749ce646f68ac8c248420045cb7b5e/113812713830668651633485064505405228688089952413000180138188702252203767758849
  2022-05-16T03:53:34.427909 https://opensea.io/assets/0x495f947276749ce646f68ac8c248420045cb7b5e/113812713830668651633485064505405228688089952413000180138188702447916837502977
API MISSING FROM STREAM
  2022-05-16T03:53:26.209940 https://opensea.io/assets/0x495f947276749ce646f68ac8c248420045cb7b5e/113812713830668651633485064505405228688089952413000180138188702027903395692545
  2022-05-16T03:53:22.007439 https://opensea.io/assets/0x495f947276749ce646f68ac8c248420045cb7b5e/113812713830668651633485064505405228688089952413000180138188701216463814393857
  2022-05-16T03:53:21.629360 https://opensea.io/assets/0x495f947276749ce646f68ac8c248420045cb7b5e/113812713830668651633485064505405228688089952413000180138188701798105465487361
API AND STREAM
---------
MATCH BOTH {
  event_timestamp: '2022-05-16T03:53:34.427909',
  permalink: 'https://opensea.io/assets/0x495f947276749ce646f68ac8c248420045cb7b5e/113812713830668651633485064505405228688089952413000180138188702447916837502977'
}
MATCH BOTH {
  event_timestamp: '2022-05-16T03:53:31.518421',
  permalink: 'https://opensea.io/assets/0x495f947276749ce646f68ac8c248420045cb7b5e/113812713830668651633485064505405228688089952413000180138188702252203767758849'
}
2022-05-16T03:53:43.458Z 2 stream only 4 api only 2 both
STREAM MISSING FROM API
  2022-05-16T03:53:40.692669 https://opensea.io/assets/0x495f947276749ce646f68ac8c248420045cb7b5e/113812713830668651633485064505405228688089952413000180138188704027915046617089
  2022-05-16T03:53:41.362691 https://opensea.io/assets/0x495f947276749ce646f68ac8c248420045cb7b5e/113812713830668651633485064505405228688089952413000180138188701139498000449537
API MISSING FROM STREAM
  2022-05-16T03:53:26.209940 https://opensea.io/assets/0x495f947276749ce646f68ac8c248420045cb7b5e/113812713830668651633485064505405228688089952413000180138188702027903395692545
  2022-05-16T03:53:22.007439 https://opensea.io/assets/0x495f947276749ce646f68ac8c248420045cb7b5e/113812713830668651633485064505405228688089952413000180138188701216463814393857
  2022-05-16T03:53:21.629360 https://opensea.io/assets/0x495f947276749ce646f68ac8c248420045cb7b5e/113812713830668651633485064505405228688089952413000180138188701798105465487361
  2022-05-16T03:53:34.841442 https://opensea.io/assets/0x495f947276749ce646f68ac8c248420045cb7b5e/113812713830668651633485064505405228688089952413000180138188700474293465645057
API AND STREAM
  2022-05-16T03:53:34.427909 https://opensea.io/assets/0x495f947276749ce646f68ac8c248420045cb7b5e/113812713830668651633485064505405228688089952413000180138188702447916837502977
  2022-05-16T03:53:31.518421 https://opensea.io/assets/0x495f947276749ce646f68ac8c248420045cb7b5e/113812713830668651633485064505405228688089952413000180138188702252203767758849
---------
MATCH BOTH {
  event_timestamp: '2022-05-16T03:53:41.362691',
  permalink: 'https://opensea.io/assets/0x495f947276749ce646f68ac8c248420045cb7b5e/113812713830668651633485064505405228688089952413000180138188701139498000449537'
}
MATCH BOTH {
  event_timestamp: '2022-05-16T03:53:40.692669',
  permalink: 'https://opensea.io/assets/0x495f947276749ce646f68ac8c248420045cb7b5e/113812713830668651633485064505405228688089952413000180138188704027915046617089'
}
2022-05-16T03:53:48.598Z 0 stream only 4 api only 4 both
STREAM MISSING FROM API
API MISSING FROM STREAM
  2022-05-16T03:53:26.209940 https://opensea.io/assets/0x495f947276749ce646f68ac8c248420045cb7b5e/113812713830668651633485064505405228688089952413000180138188702027903395692545
  2022-05-16T03:53:22.007439 https://opensea.io/assets/0x495f947276749ce646f68ac8c248420045cb7b5e/113812713830668651633485064505405228688089952413000180138188701216463814393857
  2022-05-16T03:53:21.629360 https://opensea.io/assets/0x495f947276749ce646f68ac8c248420045cb7b5e/113812713830668651633485064505405228688089952413000180138188701798105465487361
  2022-05-16T03:53:34.841442 https://opensea.io/assets/0x495f947276749ce646f68ac8c248420045cb7b5e/113812713830668651633485064505405228688089952413000180138188700474293465645057
API AND STREAM
  2022-05-16T03:53:34.427909 https://opensea.io/assets/0x495f947276749ce646f68ac8c248420045cb7b5e/113812713830668651633485064505405228688089952413000180138188702447916837502977
  2022-05-16T03:53:31.518421 https://opensea.io/assets/0x495f947276749ce646f68ac8c248420045cb7b5e/113812713830668651633485064505405228688089952413000180138188702252203767758849
  2022-05-16T03:53:41.362691 https://opensea.io/assets/0x495f947276749ce646f68ac8c248420045cb7b5e/113812713830668651633485064505405228688089952413000180138188701139498000449537
  2022-05-16T03:53:40.692669 https://opensea.io/assets/0x495f947276749ce646f68ac8c248420045cb7b5e/113812713830668651633485064505405228688089952413000180138188704027915046617089
---------

I let it run for a few minutes - there were 23 api only events not seen on the stream-js api

2022-05-16T04:01:14.786Z 0 stream only 23 api only 32 both
steddyman commented 2 years ago

Seeing the same issue with other collections too. Do the devs have an update on this please?

ZiHengLee commented 2 years ago

I have tested the stream with some collections. Unfortunately, a lot of events are missing. In my test ~60% less listing events are streamed compared to os activity dashboard of the collection

have u soveld this issue?i encountered the same problem

wei-dai commented 2 years ago

same issue using python, the list events is missing a lot, any updates?

rjbks commented 2 years ago

@rohfle Thanks for that example. Maybe I'm missing something here but isn't this example very dependent on sample time? What if events are disseminated at different times for REST and WS APIs? So event A gets published at 12:00 EST on the REST API but 2 minutes later on the WS API, making them essentially the same but staggered so any comparison from a fixed point in time will look like they're missing events.

QuitCrypto commented 2 years ago

In my experience, events were either simultaneously published to both, or they never came through on the steam API at all. My stream ran for hours, so I do not think that it was simply a delay.

Very easily reproducible by running a stream for a collection over X time and logging all events, then comparing the log to recent listings on the collection's page. You should see the discrepencies.

rohfle commented 2 years ago

@rohfle Thanks for that example. Maybe I'm missing something here but isn't this example very dependent on sample time? What if events are disseminated at different times for REST and WS APIs? So event A gets published at 12:00 EST on the REST API but 2 minutes later on the WS API, making them essentially the same but staggered so any comparison from a fixed point in time will look like they're missing events.

Thanks for getting back to me. Its a good question.

Firstly, the REST API is supposed to be slower than the WS API. If the WS API is lagging the REST API by two minutes something is terribly wrong, and people will just poll the server instead.

The example keeps track of events that it hasnt seen on both APIs, and matches them even if the event comes through one API at a much later date. When an event occurs, the time is recorded in event_timestamp. This timestamp is consistent (after dropping the +00:00 timezone) across the REST API and Stream API and its value is not affected by delays in transmission or processing. We can compare these anytime we want - same event_timestamp, same event.

The example shows several things when you run it:

  1. Every event from Stream API is seen on the REST API. (shown as "stream only 0") Sometimes there will be a delay of several seconds for the REST API to see the event. If the REST API was leading the WS API by 2 minutes, this number would be greater than zero the whole time.
  2. If you leave the script running, the number of REST API only events gets larger. If there was a time delay between the two APIs, then this number would stay approximately the same. You can see the timestamps of the events seen only on REST API too and they are many minutes old
shadmau commented 2 years ago

I have tested the stream with some collections. Unfortunately, a lot of events are missing. In my test ~60% less listing events are streamed compared to os activity dashboard of the collection

have u soveld this issue?i encountered the same problem

no solution yet. sticking to REST until it gets fixed.

rjbks commented 2 years ago

@rohfle There's a few issues with your test code that may explain why you're seeing some of these difference. Let's start on line 82, the response doesn't have an after key, its "next". But even if you had checked for "next" it still poses a problem. First, the /events REST API, is really an events history API. This is important when considering the direction in which "next" points and the order in which events are returned. The "next" cursor points to the next set of events in the past and events are returned in descending order by "event_timestamp". So the if block setting the after variable is never reached and the loop is always broken after the first iteration.

Next, your compareResults function may have some issues affecting the comparison. On lines 116 and 117 you are iterating over 2 arrays but within those loops on lines 121 and 122, you are mutating these arrays. I'm not a JS expert so maybe this doesn't apply but in every other language I'm familiar with, mutating something while you're iterating over it can lead to unexpected behavior.

Lastly, going back to the REST events API, calling the function the way your script does, with a fixed occurred_after param, can lead to repeated events (if no/not many new events have occurred since last call) and since the corresponding WS event has been matched and removed from the array, the repeat event will show as unmatched.

Having pointed out some of these possible issues, I have refactored the script and re-ran this. The results have been mixed. There have been a few runs where the events match up for the most part and a few where the REST endpoint returns significantly more events. I'd really have to take some more time to get more samples and look into it.

rohfle commented 2 years ago

Thanks for the feedback, I have updated the repro with changes. Heres my response:

Let's start on line 82, the response doesn't have an after key, its "next". But even if you had checked for "next" it still poses a problem. First, the /events REST API, is really an events history API. This is important when considering the direction in which "next" points and the order in which events are returned. The "next" cursor points to the next set of events in the past and events are returned in descending order by "event_timestamp". So the if block setting the after variable is never reached and the loop is always broken after the first iteration.

Sorry you are right, I don't know where I got after from... probably the same time I thought that the response was ascending order. It wouldn't make a difference until you got past the first page of 20 events - I only did short runs. Anyway, the issues have been fixed on my end now I'm pretty sure. You might want to make it explicit that events are returned in descending order in the docs https://docs.opensea.io/reference/retrieving-asset-events

Next, your compareResults function may have some issues affecting the comparison. On lines 116 and 117 you are iterating over 2 arrays but within those loops on lines 121 and 122, you are mutating these arrays. I'm not a JS expert so maybe this doesn't apply but in every other language I'm familiar with, mutating something while you're iterating over it can lead to unexpected behavior.

It is valid to mutate arrays while iterating over them if you use indexes and start from arr.length - 1 and work backwards instead of 0 and work forwards. Try it yourself and see.

Lastly, going back to the REST events API, calling the function the way your script does, with a fixed occurred_after param, can lead to repeated events (if no/not many new events have occurred since last call) and since the corresponding WS event has been matched and removed from the array, the repeat event will show as unmatched.

Yes I know about this. There was a workaround for this on line 72-75, where incoming REST API events are checked against already seen events.

                // cursor / time broken so as workaround check previously removed results
                match = BOTH_RESULTS.find((existing) => existing.event_timestamp == event.event_timestamp)
                if (match != undefined) {
                    return
                }

Having pointed out some of these possible issues, I have refactored the script and re-ran this. The results have been mixed. There have been a few runs where the events match up for the most part and a few where the REST endpoint returns significantly more events. I'd really have to take some more time to get more samples and look into it.

Even with the changes made, the problem still exists. Attached are logs of three different collections 20220522_elonsspaceparty.log 20220522_friendlyapesocialclub.log 20220522_honest-amiable-billionaire-club.log

This is a graph for 20220522_friendlyapesocialclub.log showing unseen events over a 10 minute period. image

If you find any more issues with my repro, let me know.

benharbit commented 2 years ago

I noticed missing events too

ethangao commented 2 years ago

same issue here. Any progress on this bug?

benharbit commented 2 years ago

same issue here. Any progress on this bug?

I have noticed a lot of bugs.

codersmith commented 2 years ago

Seeing many missed listings on the onItemListed Stream API.

Comparing to other listings notifications services or just monitoring a single collection page and manually refreshing.

esteban-OpenSea commented 2 years ago

Confirming this issue. I am also receiving something like ~50% of events when connected to a stream. Seems consistent whether I am subscribed to a single collection or several.

I have tested the stream with some collections. Unfortunately, a lot of events are missing. In my test ~60% less listing events are streamed compared to os activity dashboard of the collection

Can you clarify a bit more when you say you are missing a certain percentage of accuracy from the Stream?

rohfle commented 2 years ago

gregarious-moving-monkey-bet event misses over a three hour period image

goblintownwtf event misses approximately over the same three hour period image

source 20220524_goblintownwtf.csv 20220524_goblintownwtf.log.gz 20220524_gregarious-moving-monkey-bet.csv 20220524_gregarious-moving-monkey-bet.log.gz

nftsupply commented 2 years ago

running into this issue as well, going to use REST for now

esteban-OpenSea commented 2 years ago

Is the main issue just with the listing events? Are the other event types accurate?

rohfle commented 2 years ago

I just did a short test for a minute for Sales - There were no events missing from the Stream API compared to the REST API

esteban-OpenSea commented 2 years ago

Thanks for the update! Can the other devs running into this issue confirm that there are no longer any events missing?

codersmith commented 2 years ago

Thanks for the update! Can the other devs running into this issue confirm that there are no longer any events missing?

I believe @rohfle's response was specifically to your question regarding Sales -- there are no missing Sales events.

But there are definitely still missing Listings events.

esteban-OpenSea commented 2 years ago

So is the main issue just with Listings? Also, @codersmith , can you send me steps so I can repro? I've been testing with Listings and it's been working for me so far. Any info you can provide me with would be fantastic!

codersmith commented 2 years ago

I think @rohfle provided steps to reproduce already, but here is how I observe the problem:

I have two bots configured for the same collection:

I get more Listings alerts via the boto.io bot vs. my own. I assume this is because they are polling the /events endpoint vs subscribing to the stream.

rohfle commented 2 years ago

@esteban-OpenSea while my previous comment was just about Sales, I have retested this now with Listings and all REST API events are seen on Stream API. It looks like the problem has been resolved.

image

@esteban-OpenSea what was wrong, and how did you fix it?

steddyman commented 2 years ago

Can you please confirm what the resolution was? Has there been a code fix I need to pull, or was the resolution a backend fix?

codersmith commented 2 years ago

I have still been experiencing this issue throughout the day today. Sometimes, they do match more closely between Stream and /events, but it is definitely not resolved.

Agree with @steddyman that would like to understand what fix has been deployed to address the problem.

rohfle commented 2 years ago

If other people are still having this issue, I will test the top 5 collections for 30 mins and report the graphs

codersmith commented 2 years ago

If other people are still having this issue, I will test the top 5 collections for 30 mins and report the graphs

Thanks -- your test client is awesome

rohfle commented 2 years ago

Not the top 5, but 7 random collections i pulled out of the activity log

Comments:

Graphs 20220527_103525_beardedapeteenclub log 20220527_103525_crypto-galaxy-fight log 20220527_103525_cryptoguysnft log 20220527_103525_dip-brose-gen log 20220527_103525_excellent-vivid-dehorizon-fun log 20220527_103525_mad-ape-collection-club log 20220527_103525_meta-alpha-girl log

rjbks commented 2 years ago

same issue using python, the list events is missing a lot, any updates?

@wei-dai Can you share the python code you used to connect?

esteban-OpenSea commented 2 years ago

Thank you everyone for the updates! We've fixed some underlying issues that we believe have resolved this particular problem. Feel free to re-open if the issue returns.

ethangao commented 2 years ago

Thank you everyone for the updates! We've fixed some underlying issues that we believe have resolved this particular problem. Feel free to re-open if the issue returns.

Well done! Happy to see this issue being solved in such short time.

codersmith commented 2 years ago

Also seeing a lot more consistency from the stream connection. Will keep an eye on it, but is definitely a huge improvement.

ZionNFT commented 2 years ago

@rohfle are you able to re-run your script please?