NebulousLabs / Sia

Blockchain-based marketplace for file storage. Project has moved to GitLab: https://gitlab.com/NebulousLabs/Sia
https://sia.tech
MIT License
2.71k stars 442 forks source link

HostDB normalization and latency throughput metrics for host #3078

Open MSevey opened 6 years ago

MSevey commented 6 years ago

@DavidVorick to elaborate

MSevey commented 6 years ago

This issue is to address the potential of the renter influencing the Hosts weight, specifically the uptimePenalty calculated with uptimeAdjustments()

uptimeAdjustments() looks at the previous scans of the host which are updated in updateEntry(). updateEntry() is called from managedScanHost(). It is in managedScanHost() that a poor connection of the renter could cause a failed connection with the host. updateEntry() does check for an internet connection but if there is just a poor internet connection the host could be getting negatively weighted for failed connections.

A few checks that could be done to address this are:

  1. Check all the renter's hosts to see if they all have a high uptimePenalty which might suggest that the renter is at fault and not the hosts.
  2. Check the hosts's uptimePenalty against one or more other renters to see if the host is consistently getting a high uptimePenalty

For option 1, the hdb can be looped over and ScoreBreakdown() can be used to compare the uptimePenalty of the hosts. It would probably make sense to do this check when marking contracts as goodForRenew which happens in managedMarkContractsUtility().

For option 2, not currently sure how to, or if possible, to compare against other renter's host scores.

MSevey commented 6 years ago

@DavidVorick what are your thoughts on what I wrote up?
Is that the scope of the issue as you saw it?
Are there other host metrics that you think the renter can negatively influence?

ChrisSchinnerl commented 6 years ago

Option 2 is not really possible. Each renter has to be able to figure that out itself without relying on a separate hostdb. Option 1 probably needs more elaboration on how we can do that. We need to find some average/threshold that determines a good number for hosts that haven't been scanned yet while still being performant. The functionality should also be contained within the hostdb which means we can't put it into managedMarkContractsUtility' which I think is called within the contractor'sthreadedContractMaintenance`.

I would also move the latency throughput metrics to a different PR since they are not strongly related to the normalization of the hostdb. Something we should also normalize is the ratio between failed and successful host interactions. I guess that can be done with minimal overhead once we have figured out how to do it for uptime.

MSevey commented 6 years ago

OK so Option 2 is out, didn't think it was possible, just wanted to check and see.

Option 1 probably needs more elaboration on how we can do that. We need to find some average/threshold that determines a good number for hosts that haven't been scanned yet while still being performant.

Would it make sense to use 50? That way the renter would scan the hosts it currently has contracts with or if no contracts, 50 host.

The functionality should also be contained within the hostdb which means we can't put it into managedMarkContractsUtility' which I think is called within the contractor'sthreadedContractMaintenance`.

Makes sense, but could we export it? I'm just thinking about how it gets triggered. Would it make sense for this to be some sort on real time normalization or should it be a triggered event to course correct?

Could we even do it in real time? It seems like it would be a convoluted process of keeping track of the current host and pulling new hosts from the hostdb since all the functions in scan.go are looking at a single host. It seems it would be more streamline if it was a triggered event that could then do some course correction on the host weights.

DavidVorick commented 6 years ago

What I originally had in mind was just keeping track of all of the scans that we performed at all, and then tracking what percentage of those were successful. In general we are going to spend more time scanning hosts that are online than we are hosts that are offline.

Scans that wouldn't contribute to the total number would be scans of hosts which we've never seen online before, and scans of hosts that were never seen as online again (meaning starting from one bad scan, they are just offline from that point forward)

DavidVorick commented 6 years ago

Hosts that we've never scanned before wouldn't have a score at all, they would be in the queue to be scanned immediately so that we could get a baseline for the hosts.

EvilRedHorse commented 5 years ago

This is happening to me and my uptime is being trashed just for participating.