buddycloud / buddycloud-server-java

The buddycloud server written in Java.
http://buddycloud.com
Apache License 2.0
67 stars 40 forks source link

RSM paging issue with recent-items request #88

Open Schnouki opened 10 years ago

Schnouki commented 10 years ago

Finally migrated my old server and moved to the Java server \o/ But several things don't work. I'm using a fresh build from git in a Ubuntu VM.

Extract of the HTTP API log that described the problem: http://ix.io/92b/xml (beautified so it's easier to read).

Basically, the first IQ exchange is correct: a few items are returned, as expected. The HTTP API then requests a few more items, using RSM paging. But the channels server replies with the exact same list of items. So the HTTP API sends another request for more items... and we now have a distributed infinite loop.

I added some logging in src/main/java/org/buddycloud/channelserver/db/jdbc/JDBCNodeStore.java, and the problem apparently is that afterItem is null after the call to getNodeItem() (line 1195). As a result, the paging conditions are not appended to the SQL query.

I don't know enough about how the Java server works (yet), so I've not been able to fix it by myself.

Any idea?

ashward commented 10 years ago

It looks like the and being sent back are incorrect:

/user/schnouki@pouet.im/posts,2e400539-adb7-4572-8c61-809d9e330e72 /user/simon@buddycloud.org/posts,tag:null@channels.buddycloud.org,/user/simon@buddycloud.org/posts,4bbac59e-9327-4314-a7b9-02888701c0c3

They should be coming back as something like:

tag:null@channels.buddycloud.org,/user/github@topics.buddycloud.org/posts,092f2389-38e8-445c-a347-6e1bd3308add

Note that is missing the tag:channels.buddycloud.org and the has an extra node id stuck to the front.

— Ash

On 15 Nov 2013, at 16:25, Thomas Jost notifications@github.com wrote:

Finally migrated my old server and moved to the Java server \o/ But several things don't work. I'm using a fresh build from git in a Ubuntu VM.

Extract of the HTTP API log that described the problem: http://ix.io/92b/xml (beautified so it's easier to read).

Basically, the first IQ exchange is correct: a few items are returned, as expected. The HTTP API then requests a few more items, using RSM paging. But the channels server replies with the exact same list of items. So the HTTP API sends another request for more items... and we now have a distributed infinite loop.

I added some logging in src/main/java/org/buddycloud/channelserver/db/jdbc/JDBCNodeStore.java, and the problem apparently is that afterItem is null after the call to getNodeItem() (line 1195). As a result, the paging conditions are not appended to the SQL query.

I don't know enough about how the Java server works (yet), so I've not been able to fix it by myself.

Any idea?

— Reply to this email directly or view it on GitHub.

lloydwatkin commented 10 years ago

There's no formal way of stating what a post ID should be. The node server did it differently to itself in some places. I'd like to see us migrate to a full post ID in every instance and these old problems will go away over time.

The problem is that https://github.com/buddycloud/buddycloud-server-java/blob/35f862671b3134099897917f0fd467bf3b0518a8/src/main/java/org/buddycloud/channelserver/pubsub/model/impl/GlobalItemIDImpl.java#L89 isn't able to extract the post ID from the item reference, therefore the after value is coming out as null.

Maybe we should add some detection for this item reference case to the server?

Schnouki commented 10 years ago

It looks like the problem is that remote posts have an ID similar to tag:null@channels.buddycloud.org,/user/github@topics.buddycloud.org/posts,092f2389-38e8-445c-a347-6e1bd3308add, and local posts just have 092f2389-38e8-445c-a347-6e1bd3308add. So when you prepend it with a node name, for local posts you have node,id and for remote posts you have node,tag,node,id. Right ?

So why are remote posts stored with a tag,node,id ? Wouldn't just storing the post ID (without node and tag, which can be found elsewhere anyway) solve this?

(Spoiler alert: I'm not a fan of parsing IDs, which are supposed to be opaque strings. I had to add it in the node server for requests that return results from several nodes at once, but I didn't like it...)

ashward commented 10 years ago

Excuse my top posting but in on my mobile.

On Saturday, 16 November 2013, Thomas Jost wrote:

It looks like the problem is that remote posts have an ID similar to tag:null@channels.buddycloud.org <javascript:_e({}, 'cvml', 'tag%3Anull@channels.buddycloud.org');>,/user/ github@topics.buddycloud.org/posts,092f2389-38e8-445c-a347-6e1bd3308add, and local posts just have 092f2389-38e8-445c-a347-6e1bd3308add. So when you prepend it with a node name, for local posts you have node,id and for remote posts you have node,tag,node,id. Right ?

So why are remote posts stored with a tag,node,id ? Wouldn't just storing the post ID (without node and tag, which can be found elsewhere anyway) solve this?

(Spoiler alert: I'm not a fan of parsing IDs, which are supposed to be opaque strings. I had to add it in the node server for requests that return results from several nodes at once, but I didn't like it...)

— Reply to this email directly or view it on GitHubhttps://github.com/buddycloud/buddycloud-server-java/issues/88#issuecomment-28621637 .

ashward commented 10 years ago

Perhaps I'll actually try and say something this time!

tl;dr we should be using the global item ids everywhere.

Pubsub requires item ids to be unique across the service. In buddycloud we aggregate items from many disparate devices and so need to ensure that item ids are unique across all the connected services. We could use proper uuids (but due to these being opaque we could still run into issues with nefarious folk deliberately sending id clashes and not really be able to check), or we could do what we are doing and construct one from the service jid, node id and local item id. At least having it vaguely parseable and containing the node id allows us to check that it has been sent by an authoritative server for that item id. There are already plenty of places within the bc protocol which allow for mis-attribution attacks that we need to make sure we're mindful of these things and not just trust blindly what a remote server tells us.

Btw the null@ shouldn't form part of the standard, but looked like an error in the node server. I suppose there is a debate as to whether we actually need the service jid as this is already contained in the node id anyway.

But long story short, we need to be using the same item id scheme universally across all the pubsub requests. The tag: scheme seems to fit the requirements fairly well as long as we use it everywhere.

Ash

On Saturday, 16 November 2013, Ashley Ward wrote:

Excuse my top posting but in on my mobile.

On Saturday, 16 November 2013, Thomas Jost wrote:

It looks like the problem is that remote posts have an ID similar to tag:null@channels.buddycloud.org,/user/ github@topics.buddycloud.org/posts,092f2389-38e8-445c-a347-6e1bd3308add, and local posts just have 092f2389-38e8-445c-a347-6e1bd3308add. So when you prepend it with a node name, for local posts you have node,id and for remote posts you have node,tag,node,id. Right ?

So why are remote posts stored with a tag,node,id ? Wouldn't just storing the post ID (without node and tag, which can be found elsewhere anyway) solve this?

(Spoiler alert: I'm not a fan of parsing IDs, which are supposed to be opaque strings. I had to add it in the node server for requests that return results from several nodes at once, but I didn't like it...)

— Reply to this email directly or view it on GitHubhttps://github.com/buddycloud/buddycloud-server-java/issues/88#issuecomment-28621637 .

Schnouki commented 10 years ago

Item IDs must be unique across a node, not necessarily across a service (see XEP-0060: "ItemID: A unique identifier for an item in the context of a specific node"). Most PubSub requests are only about a single node, so global IDs are not needed for them. The exceptions are things related to MAM, and some buddycloud-specific stuff like <recent-items/>, which can return results from several nodes. In that case, yes, another kind of ID (that includes the node) is required.

But anyway. Is it normal that items from local and remote channels have different item IDs in the RSM part of the <recent-items/> response?

ashward commented 10 years ago

Item IDs must be unique across a node, not necessarily across a service (see XEP-0060: "ItemID: A unique identifier for an item in the context of a specific node"). Most PubSub requests are only about a single node, so global IDs are not needed for them. The exceptions are things related to MAM, and some buddycloud-specific stuff like , which can return results from several nodes. In that case, yes, another kind of ID (that includes the node) is required.

That’s exactly the problem - all the standard result sets in xep-0060 only contains items from a single node, so there is no problem with duplicate item ids across different nodes.

The problem with bc is that there are result sets with items from multiple nodes, but the rsm ids refer to item ids which can be ambiguous. We could just use the normal item ids for normal calls and compound (i.e. with the node) ids for calls like , but I don’t really like that . My personal view is that we should be using consistent ids across all the standard pubsub calls and the extended buddycloud ones, so ensuring that a bc server uses the compound ids across the board seems like a good option to me.

— Ash

lloydwatkin commented 10 years ago

+1 for consistency


Steven Lloyd Watkin Software Engineer PHP ::: Java ::: Ruby ::: Node.js ::: XMPP lloyd@evilprofessor.co.uk (email+jid) ::: http://www.evilprofessor.co.uk Facebook / Twitter / Flickr: lloydwatkin

Organiser of WORLD RECORD breaking charity event: Scuba Santas ::: http://www.scuba-santas.co.uk Supporting the RNLI & DDRC - 15th December 2013 - NDAC, Chepstow

On 18 November 2013 11:18, Ashley Ward notifications@github.com wrote:

Item IDs must be unique across a node, not necessarily across a service (see XEP-0060: "ItemID: A unique identifier for an item in the context of a specific node"). Most PubSub requests are only about a single node, so global IDs are not needed for them. The exceptions are things related to MAM, and some buddycloud-specific stuff like , which can return results from several nodes. In that case, yes, another kind of ID (that includes the node) is required.

That’s exactly the problem - all the standard result sets in xep-0060 only contains items from a single node, so there is no problem with duplicate item ids across different nodes.

The problem with bc is that there are result sets with items from multiple nodes, but the rsm ids refer to item ids which can be ambiguous. We could just use the normal item ids for normal calls and compound (i.e. with the node) ids for calls like , but I don’t really like that . My personal view is that we should be using consistent ids across all the standard pubsub calls and the extended buddycloud ones, so ensuring that a bc server uses the compound ids across the board seems like a good option to me.

— Ash

— Reply to this email directly or view it on GitHubhttps://github.com/buddycloud/buddycloud-server-java/issues/88#issuecomment-28690503 .

ashward commented 10 years ago

On 18 Nov 2013, at 11:21, Lloyd Watkin notifications@github.com wrote:

+1 for consistency

Yeah. I don’t see an issue with mandating in the bc protocol that item ids are always generated by the server (so ignoring ids on new items with ids created by the client), and mandating a specific format for those ids to ensure consistency and uniqueness across multiple services.

— Ash

lloydwatkin commented 10 years ago

I like the idea of being able to set an ID, but the compound ID should be generated by the server (generally stored within the atom payload). Being able to manually set the ID isn't a killer feature for me though so happy to let it go.

Schnouki commented 10 years ago

I have absolutely nothing against consistency -- but don't forget RSM can be used for something else than items: subscriptions, affiliations, etc. Or even mixed types with MAM: a few items, then a subscription update, then a node config update, then more items... So in the end I'm not sure that a global consistency is even possible. But surely there are better places to discuss this.

Back to this issue: in the log I posted above, is it normal that items from local and remote channels don't have the same format? (item_id for local channels vs. tag,node,item_id)

lloydwatkin commented 10 years ago

Consistency for items I think is the important part :)

The RSM values (for ) are being generated here: https://github.com/buddycloud/buddycloud-server-java/blob/master/src/main/java/org/buddycloud/channelserver/packetprocessor/iq/namespace/pubsub/get/RecentItemsGet.java#L143. Therefore they should be consistent unless the call to get the item ID returns something different.


Steven Lloyd Watkin Software Engineer PHP ::: Java ::: Ruby ::: Node.js ::: XMPP lloyd@evilprofessor.co.uk (email+jid) ::: http://www.evilprofessor.co.uk Facebook / Twitter / Flickr: lloydwatkin

Organiser of WORLD RECORD breaking charity event: Scuba Santas ::: http://www.scuba-santas.co.uk Supporting the RNLI & DDRC - 15th December 2013 - NDAC, Chepstow

On 18 November 2013 11:36, Thomas Jost notifications@github.com wrote:

I have absolutely nothing against consistency -- but don't forget RSM can be used for something else than items: subscriptions, affiliations, etc. Or even mixed types with MAM: a few items, then a subscription update, then a node config update, then more items... So in the end I'm not sure that a global consistency is even possible. But surely there are better places to discuss this.

Back to this issue: in the log I posted above, is it normal that items from local and remote channels don't have the same format? (item_id for local channels vs. tag,node,item_id)

— Reply to this email directly or view it on GitHubhttps://github.com/buddycloud/buddycloud-server-java/issues/88#issuecomment-28691540 .

abmargb commented 10 years ago

+1 for using local ids in single node requests, global ids otherwise. The recent-items query seem to be exception here, not the rule, so I would not worry about consistency here.

On Mon, Nov 18, 2013 at 8:49 AM, Lloyd Watkin notifications@github.comwrote:

Consistency for items I think is the important part :)

The RSM values (for ) are being generated here:

https://github.com/buddycloud/buddycloud-server-java/blob/master/src/main/java/org/buddycloud/channelserver/packetprocessor/iq/namespace/pubsub/get/RecentItemsGet.java#L143.

Therefore they should be consistent unless the call to get the item ID returns something different.


Steven Lloyd Watkin Software Engineer PHP ::: Java ::: Ruby ::: Node.js ::: XMPP lloyd@evilprofessor.co.uk (email+jid) ::: http://www.evilprofessor.co.uk Facebook / Twitter / Flickr: lloydwatkin

Organiser of WORLD RECORD breaking charity event: Scuba Santas ::: http://www.scuba-santas.co.uk Supporting the RNLI & DDRC - 15th December 2013 - NDAC, Chepstow

On 18 November 2013 11:36, Thomas Jost notifications@github.com wrote:

I have absolutely nothing against consistency -- but don't forget RSM can be used for something else than items: subscriptions, affiliations, etc. Or even mixed types with MAM: a few items, then a subscription update, then a node config update, then more items... So in the end I'm not sure that a global consistency is even possible. But surely there are better places to discuss this.

Back to this issue: in the log I posted above, is it normal that items from local and remote channels don't have the same format? (item_id for local channels vs. tag,node,item_id)

— Reply to this email directly or view it on GitHub< https://github.com/buddycloud/buddycloud-server-java/issues/88#issuecomment-28691540>

.

— Reply to this email directly or view it on GitHubhttps://github.com/buddycloud/buddycloud-server-java/issues/88#issuecomment-28692192 .

Abmar Barros MSc in Computer Science from the Federal University of Campina Grande - www.ufcg.edu.br OurGrid Team Leader - www.ourgrid.org Buddycloud Dev - www.buddycloud.org Paraíba - Brazil