filecoin-project / go-f3

Golang implementation of Fast Finality for Filecoin (F3)
Apache License 2.0
12 stars 7 forks source link

Initial passive testing is happy but not next ones started with 10m succession #765

Open masih opened 2 days ago

masih commented 2 days ago

Critical question: Why is it that the first test in the morning always seem to work nice, and successive tests seem to run not as good?

Looking at the pubsub settings we forked over from Lotus, there are... a lot of questionable decisions that seem to be rooted in pre-F3 filecoin network behaviour (e.g. this).

I wonder if change in passive testing network causes some loss of mesh or unfair peer scoring such that gossip sub mesh becomes ineffective to the point where messages simply do not propagate fast enough. Take invalid message scoring for example, when networks change it is inevitable that some messages arrive rom previous network that would be considered invalid. We also observe spike in invalid message error in validation flow documented here at initial instance.

So...

masih commented 2 days ago

And looks like lotus (and by extension Observer, F3, etc.) retains negative scoring for 6 hours. This is a setting set at top level pubsub. I assume it affects the pubsub instance, i.e. all topics in its lifetime.

rjan90 commented 2 days ago

Anecdotally I see a lot of PeerIDs with the exact same really high negative score:

lotus net scores
12D3KooWBPyrDyrTRchikR56W21cW3dQ5YRDeAgCZvPjw7jopfuU, -1795600.000000
12D3KooWBNh4V7JeEvYLKvSbGeMMMFJyB3vavEyEipqNYaZh9cNS, -1795600.000000
12D3KooWBNMVxsBq4T5T8qX8E1FWhfyVULDJ56a3mE1m6r3bEJ8f, -1795600.000000
12D3KooWAy4R5DgHcAuP7Z6CJyesQXkNPfoBFShMtdMtg1z3dhWS, -1795600.000000
12D3KooWAmPdJJcrNQ9qL4Dtj229kJ2VngPtrEmz6fd7duc6N8Q4, -1795600.000000
12D3KooWAewsJcXcVoEhCwfvD7zWwCPae8WtVvcL8nvy84HdNivL, -1795600.000000
12D3KooWAY9Vq9wzqRjzaoKheXPDVf9YCf1GpQ32V4mtjtxAaHPW, -1795600.000000
12D3KooWAPsAXsxBpuRJbjiX7cFzNsA8A1UZe8ikWsbgxZ7DDu5Y, -1795600.000000
12D3KooWAEZaEAwxco3Coho2c4KESS5Q868NYhXzSHAXdvwomYAt, -1795600.000000

A total of 139 on my node with the exact same negative score, out of a total of:

lotus net scores | wc -l
2045

Total number of PeerIDs that have negative scores is:

622
rjan90 commented 2 days ago

For clarity I also grepped for the ones that subscribe to F3, and most have 0 scores - with some occasional negative ones, but not the high negative score as ^^

{"ID":"12D3KooW9sCwBYPVGr9T7A5DMzk8qF4wdGtTGSREK7kMLdJDBLR6","Score":{"Score":0,"Topics":{"/f3/granite/0.0.2/filecoin/21":{"TimeInMesh":0,"FirstMessageDeliveries":0,"MeshMessageDeliveries":0,"InvalidMessageDeliveries":0}},"AppSpecificScore":0,"IPColocationFactor":0,"BehaviourPenalty":0}}
{"ID":"12D3KooW9rUCW2eEmbZsGarEBzdh7RwqZXzVhm5yW4GHpM4PxGLV","Score":{"Score":0,"Topics":{"/f3/granite/0.0.2/filecoin/21":{"TimeInMesh":0,"FirstMessageDeliveries":0,"MeshMessageDeliveries":0,"InvalidMessageDeliveries":0}},"AppSpecificScore":0,"IPColocationFactor":0,"BehaviourPenalty":0}}
{"ID":"12D3KooW9qsRsJmXkgYuyJnZNDwpB75Lhs1dw6myiNFDTLgwgbQA","Score":{"Score":0,"Topics":{"/f3/granite/0.0.2/filecoin/21":{"TimeInMesh":0,"FirstMessageDeliveries":0,"MeshMessageDeliveries":0,"InvalidMessageDeli
[14:16](https://filecoinproject.slack.com/archives/C077HAHSP8U/p1732886203372249?thread_ts=1732883402.397439&cid=C077HAHSP8U)

And the ones with extremly high negative scores are IPColocationFactor

{"ID":"12D3KooWBNMVxsBq4T5T8qX8E1FWhfyVULDJ56a3mE1m6r3bEJ8f","Score":{"Score":-1876900,"Topics":null,"AppSpecificScore":0,"IPColocationFactor":18769,"BehaviourPenalty":0}}
{"ID":"12D3KooWAy4R5DgHcAuP7Z6CJyesQXkNPfoBFShMtdMtg1z3dhWS","Score":{"Score":-1876900,"Topics":null,"AppSpecificScore":0,"IPColocationFactor":18769,"BehaviourPenalty":0}}
{"ID":"12D3KooWAmPdJJcrNQ9qL4Dtj229kJ2VngPtrEmz6fd7duc6N8Q4","Score":{"Score":-1876900,"Topics":null,"AppSpecificScore":0,"IPColocationFactor":18769,"BehaviourPenalty":0}}
{"ID":"12D3KooWAewsJcXcVoEhCwfvD7zWwCPae8WtVvcL8nvy84HdNivL","Score":{"Score":-1876900,"Topics":null,"AppSpecificScore":0,"IPColocationFactor":18769,"BehaviourPenalty":0}}
{"ID":"12D3KooWAY9Vq9wzqRjzaoKheXPDVf9YCf1GpQ32V4mtjtxAaHPW","Score":{"Score":-1876900,"Topics":null,"AppSpecificScore":0,"IPColocationFactor":18769,"BehaviourPenalty":0}}
{"ID":"12D3KooWAPsAXsxBpuRJbjiX7cFzNsA8A1UZe8ikWsbgxZ7DDu5Y","Score":{"Score":-1876900,"Topics":null,"AppSpecificScore":0,"IPColocationFactor":18769,"BehaviourPenalty":0}}
{"ID":"12D3KooWAEZaEAwxco3Coho2c4KESS5Q868NYhXzSHAXdvwomYAt","Score":{"Score":-1876900,"Topics":null,"AppSpecificScore":0,"IPColocationFactor":18769,"BehaviourPenalty":0}}
{"ID":"12D3KooWA4kybwSTq57KfMxJ4unVPbFTXTxRpe1S5HcKALRvu2FY","Score":{"Score":-1876900,"Topics":null,"AppSpecificScore":0,"IPColocationFactor":18769,"BehaviourPenalty":0}

Another test after a prolonged pause should be ran to rule out peer scares, but it does not seem that peerIDs get negatively scored.