Closed freimair closed 4 years ago
I just compiled in some more data
Regarding the following items from the description above:
- bin up the data
- create a "special" key for addressing bins
- tie those bins to the application version
- create a new bin on every release holding only new data
Do you intend here to check these binary blobs into the main Bisq repository, or something else? I would really like to avoid adding more binary data to the repository (as we're already doing with all the stuff in p2p/src/main/resources
).
If checking the blobs in is the intended solution, @freimair, I'd like us to look into doing this properly with Git LFS instead, and at the same time migrating the p2p/src/main/resources
files there, too. GitHub has a free tier we might want to start with. I ran some basic numbers and I think we could get away with it, but it depends on how many people are cloning the repository every month (because the pricing is metered on bandwidth used). We could also potentially run our own LFS server, but it would probably be best to go with the free GitHub service until we see we're running it out.
See:
/cc @wiz as running our own LFS server would be ops territory. Just FYI at this point.
Also, from the introduction:
During startup, Bisq is required to send >4MB of data to seednodes in order to get its initial data.
I'm unfamiliar with this problem, and reading this doesn't help me understand what's really going on. Why would a Bisq node need to send so much data to its seednode(s)? I could understand that it might need to receive quite a bit of data, but I'm clearly missing something. I read through the rest of the description a couple times and I don't think this is ever made clear. ELI5, please :)
Why would a Bisq node need to send so much data to its seednode(s)?
- sorry, I took that as common knowledge, because that is how Bisq always worked
- let the ELI5 commence:
- on startup, the Bisq app asks the seed node for a "distributed-database update"
- In order to not burden the seednode to send all the data (> 12MB), Bisq tells the seednode which objects it already has (ie. sends data to the seednode).
- The seed node then only sends the data the bisq app does not have already.
Do you intend here to check these binary blobs into the main Bisq repository, or something else? I would really like to avoid adding more binary data to the repository (as we're already doing with all the stuff in
p2p/src/main/resources
).
yes, I intend to check these binary blobs into the main Bisq repository. It is exactly about the stuff in p2p/src/main/resources
which is a snapshot of the "distributed-database" we ship with each release.
size(t) = size(t-1)+size(newData)
per release. (actually, it is several files for different message types, but overall, it is one blob of data)size(t) = size(newData)
, the "old" blobs are left untouched and are used as they are (historical data does not change)I'd like us to look into doing this properly with Git LFS instead
- I totally agree that we have to move away from committing binary data to the repo, but
- using [insert your favorite storage technology here] does not collide with this project
- can be done later
- should be done sooner than later
- will look into Git LFS as a followup-project
All in all, this project aims for making small steps towards a more reliable service. Rethinking the storage synchronization and locations is a whole other can of worms.
Btw. just checked. We have 110k objects now, at the time of project creation it has been 104k -> approx. +5% in 25 days.
The proposal looks well-formed, so I've removed the needs:triage
label and added needs:approval
per the process.
I am simply not well-informed enough about the details and alternatives to give a meaningful thumbs-up on approving this, but mine is just one voice. Like any other proposal, we should be looking for a broader consensus of interested and informed parties. If you are one of these people (@stejbac?), please provide feedback. The approach here looks pragmatic enough, but it would be good to see other informed opinions.
From a budgeting perspective, it appears to me this is 100% dev team budget, so @ripcurlx, I'll leave it to you to weigh in on.
And regarding my comments about Git LFS above, see bisq-network/bisq#4114, which will be treated separately from this project.
This is a critical issue that reproduces on slow network connections often now
From a budgeting perspective, it appears to me this is 100% dev team budget, so @ripcurlx, I'll leave it to you to weigh in on.
For me this is a critical issue atm for some of our users, but as mentioned the group of people affected by this is growing every day. So from my side it would be a 👍 to start working on this project.
@ripcurlx, I'll add the has:budget
label, then.
It would be great to see more engagement on approval, but even though we've gotten only a few comments here, it sounds like there's consensus we should go head. I'll add the has:approval
label accordingly.
@freimair, please move this to In Progress
as and when appropriate.
the implementation is currently being prepared to be tested in the production network. @sqrrm will upgrade his explorer-seednode to run the new code (that would be v1.3.5 + the changes of this project) so that a few devs can use it productively and see if anything bad shows up. The plan is to do so for one release cycle. If nothing bad shows up, we will proceed with the rather complex upgrade process.
the implementation is currently being prepared to be tested in the production network. @sqrrm will upgrade his explorer-seednode to run the new code (that would be v1.3.5 + the changes of this project) so that a few devs can use it productively and see if anything bad shows up. The plan is to do so for one release cycle. If nothing bad shows up, we will proceed with the rather complex upgrade process.
Is there any update on this?
the project has been completed by https://github.com/bisq-network/bisq/pull/4586
During startup, Bisq is required to send >4MB of data to seednodes in order to get its initial data. This is an issue because
The primary goal of this project is to reduce the amount of data to be sent on startup.
Why/why now?
Details: Problem statement, Numbers, Proposal, ...
Click to unfold
# Problem statement On startup, a Bisq application first requests up-to-date network data from two seednodes. Once data comes in, the Bisq app jumps from the loading screen to the trading UI. However, if no data arrives, Bisq stays at the loading screen forever. There are two main reasons why this happens: - internet uplink is too slow and hits a seednode's connection timeout during request - the initial data request is huge. It by the time of writing exceeds 4MB and is bound to grow further # Numbers The numbers below can be transformed to actual request size since each object is represented by a 20 Byte key in the initial outgoing data request basically saying "I already have that, please do not send it". ![Screenshot from 2020-03-06 09-51-32](https://user-images.githubusercontent.com/1070734/76067702-1e823300-5f90-11ea-89af-d5ea0e6c79a5.png) ![Screenshot from 2020-03-06 09-51-27](https://user-images.githubusercontent.com/1070734/76067706-1fb36000-5f90-11ea-9295-a6f8b5ce68b1.png) ![Screenshot from 2020-03-06 09-51-21](https://user-images.githubusercontent.com/1070734/76067708-217d2380-5f90-11ea-8f83-2530cefbb003.png)Data table
| Version | Release date | SignedWitness | AccountAgeWitness | TradeStatistics2 | total | others | total diff | | --- | --- | ---:| ---:| ---:| ---:| ---:| ---:| | v0.9.1 | 2018-12-13 | 1057 | 21132 | 19490 | 41802 | 123 | | | v0.9.2 | 2019-01-08 | 1057 | 22056 | 21384 | 44620 | 123 | 2818 | | v0.9.5 | 2019-03-06 | 1057 | 24593 | 25212 | 50985 | 123 | 6365 | | v1.0.1 | 2019-04-16 | 1057 | 26550 | 27249 | 54979 | 123 | 3994 | | v1.1.1 | 2019-05-06 | 1057 | 27360 | 28585 | 57125 | 123 | 2146 | | v1.1.2 | 2019-06-04 | 1057 | 29437 | 30558 | 61196 | 144 | 4071 | | v1.1.3 | 2019-07-16 | 1057 | 32172 | 34344 | 67753 | 180 | 6557 | | v1.1.5 | 2019-08-08 | 1057 | 33664 | 36248 | 71149 | 180 | 3396 | | v1.1.7 | 2019-09-23 | 1057 | 36493 | 40273 | 77938 | 115 | 6789 | | v1.2.2 | 2019-11-01 | 1057 | 38665 | 42657 | 82494 | 115 | 4556 | | v1.2.3 | 2019-11-07 | 1171 | 39415 | 43009 | 83710 | 115 | 1216 | | v1.2.4 | 2019-12-05 | 1441 | 41114 | 45475 | 88145 | 115 | 4435 | | v1.2.5 | 2020-01-09 | 1693 | 43102 | 48049 | 92959 | 115 | 4814 | | v1.2.7 | 2020-02-13 | 1920 | 45204 | 51222 | 98461 | 115 | 5502 | | live | 2020-03-06 | 2123 | 46989 | 54322 | 103997 | 563 | 5536 |Risks
Alternative approaches
Tasks
Criteria for Delivery
Estimates