Tribler / trustchain-superapp

Kotlin implementation of Trustchain and IPv8 with rich networking: multihoming of local Bluetooth+4G, decentral social networking, UDP hole punching, etc.
GNU General Public License v3.0
80 stars 63 forks source link

Improve the EVA protocol #101

Open rmadhwal opened 2 years ago

rmadhwal commented 2 years ago

Hi, we're using the EVA protocol to download apps as a fallback in Freedom of Computing, while experimenting with the protocol I noticed that transfers would time out about half the time.

On investigation, I found that this was due to data packets not making it through to the requester and this resulting in multiple reacknowledgements which inevitably lead to the Timeout Errors. I tried increasing the retransmit interval and retransmit attempt counts (along with the overall timeout), but this didn't yield in much success either. Finally, I was able to obtain some consistency with a slightly increased timeout and a smaller block size (1200 -> 600), I suspect that larger packets are somehow getting dropped over UDP and perhaps further investigation is needed to determine the right way forward.

Further, sometimes (much more rarely than data packets), write request payloads also do not make it to the requester

drew2a commented 2 years ago

We see quite similar behavior of the EVA protocol that was revealed during experiments made by @devos50. In this PR I fixed all the problems that have been found: https://github.com/Tribler/tribler/pull/6831

Hope it will help you as well.

synctext commented 2 years ago

thnx for that reference! That is great info to see, attempting only a single write to sockets is quite a bug in my world. Very much needs fixing if Kotlin code has this bug also :construction:

drew2a commented 2 years ago

Regarding the only one attempt to send WriteRequest. I should say that it was done intentionally.

My arguments were:

  1. Message retransmitting by itself adds complexity to the protocol and could lead to hidden bugs.
  2. An user of EVA will handle failures of sending WriteRequest on the application layer. Why? Because WriteRequest is the very first packet sent to the peer and it can be rejected/dropped for multiple reasons. These failure handlers could differ from application to application.
  3. I assumed that these handlers will be presented anyway. That's why it is a reason to move the responsibility of handling WriteRequest failures to the application layer.

But now we have three examples of using EVA: channels, super-app, and Martijn's experiments. Two of them suffer from a lack of retransmitting WriteRequest and assume that WriteRequest should be retransmitted in case of failure.

So, my line of reasoning was wrong.

drew2a commented 2 years ago

See also this finding from @kozlovsky https://github.com/Tribler/tribler/issues/6845

rmadhwal commented 2 years ago

That seems like a bug indeed, but isn't the issue that I observed in practice in the kotlin implementation, rather than packets arriving out of order, during failures they don't seem to arrive at all.

But a potential patch to the kotlin library should indeed address that too, thanks for pointing it out!

rmadhwal commented 2 years ago

Beyond the above, we also seem to notice an OutOfMemory error when trying to send a large file over EVA, is that also something you guys experienced in Python ? I suspect a possible memory leak somewhere since the files should be broken into blocks so it's hard to guess why this occurs otherwise (Could also be specfic to the Kotlin implementation)

devos50 commented 2 years ago

I haven't seen this behaviour in my experiments where I'm using EVA to regularly send ~300KB items between peers (using the Python implementation). @drew2a maybe you have seen something related to this?

drew2a commented 2 years ago

@rmadhwal @devos50 no, we have never seen the OutOfMemory error in Python. I have a regular test for 1Mb transfer here maybe some modification of this test will be helpful to investigate the memory leak problem.

    async def test_one_megabyte_transfer(self):
        # In this test we send `1Mb` transfer from Alice to Bob.
        data_size = 1024 * 1024
        data = os.urandom(1), os.urandom(data_size), random.randrange(0, 256)

        await self.alice.eva.send_binary(self.bob.my_peer, *data)
        assert self.bob.most_recent_received_data == data

        await self.alice.data_has_been_sent.wait()
        assert self.alice.most_recent_sent_data == data

BTW how big are the files that you trying to send over EVA?

@rmadhwal if you give me a link to the Kotlin implementation of EVA, I (probably) can check it on potential problems.

rmadhwal commented 2 years ago

We are okay with a ~5 mb transfer too, the issue manifested when we tried to transfer a ~150 mb file

The offending area seems to be around here, it eventually runs out of memory, presumably because the payloads being sent are not being garbage collected .

Here's the OutOfMemory error we see:

W/elft.trustchai: Throwing OutOfMemoryError "Failed to allocate a 123913400 byte allocation with 6291456 free bytes and 97MB until OOM, target footprint 507705360, growth limit 603979776" (VmSize 2748848 kB)
E/AndroidRuntime: FATAL EXCEPTION: DefaultDispatcher-worker-1
    Process: nl.tudelft.trustchain, PID: 11108
    java.lang.OutOfMemoryError: Failed to allocate a 123913400 byte allocation with 6291456 free bytes and 97MB until OOM, target footprint 507705360, growth limit 603979776
        at java.util.Arrays.copyOf(Arrays.java:3161)
        at kotlin.collections.ArraysKt___ArraysJvmKt.plus(_ArraysJvm.kt:2226)
        at nl.tudelft.ipv8.Community.serializePacket(Community.kt:358)
        at nl.tudelft.ipv8.Community.serializePacket--cT5jE0(Community.kt:318)
        at nl.tudelft.ipv8.Community.serializePacket--cT5jE0$default(Community.kt:310)
        at nl.tudelft.trustchain.common.DemoCommunity.sendApp(DemoCommunity.kt:235)
        at nl.tudelft.trustchain.common.DemoCommunity.onAppRequest(DemoCommunity.kt:223)
        at nl.tudelft.trustchain.common.DemoCommunity.onAppRequestPacket(DemoCommunity.kt:216)
        at nl.tudelft.trustchain.common.DemoCommunity.access$onAppRequestPacket(DemoCommunity.kt:34)
        at nl.tudelft.trustchain.common.DemoCommunity$3.invoke(DemoCommunity.kt:169)
        at nl.tudelft.trustchain.common.DemoCommunity$3.invoke(DemoCommunity.kt:34)
        at nl.tudelft.ipv8.Community.onPacket(Community.kt:162)
        at nl.tudelft.ipv8.messaging.Endpoint.notifyListeners(Endpoint.kt:28)
        at nl.tudelft.ipv8.messaging.EndpointAggregator.onPacket(EndpointAggregator.kt:75)
        at nl.tudelft.ipv8.messaging.Endpoint.notifyListeners(Endpoint.kt:28)
        at nl.tudelft.ipv8.messaging.udp.UdpEndpoint.handleReceivedPacket$ipv8(UdpEndpoint.kt:188)
        at nl.tudelft.ipv8.messaging.udp.UdpEndpoint$bindSocket$1.invokeSuspend(UdpEndpoint.kt:164)
        at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
        at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
        at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:571)
        at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:750)
        at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:678)
        at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:665)
drew2a commented 2 years ago

@rmadhwal did you get this error by running an emulator or a real device?

My guess this error is not related to protocol but to the environment.

@devos50 @rmadhwal what do you think guys, maybe it is better to use LibTorrent to transfer big files instead of EVA? EVA is relatively slow and has not been designed for this type of transfer.

rmadhwal commented 2 years ago

@drew2a I experienced the same behaviour on both.

We're using EVA as fallback because torrenting fails for us across networks so EVA seems to be the only option for now. Though if we want to restrict EVA to smaller size transfers, our team could look at splitting the file up and sending it piecewise through EVA which would be easier on the memory.

devos50 commented 2 years ago

@drew2a I also tried libtorrent in my experiments but that was quite complicated to setup and requires additional information to be gossiped around (e.g., the port the libtorrent session is listening on). The main advantage is that libtorrent is able to parallelise the download of particular pieces, and has congestion control (either provided by TCP or uTP). For the moment being, EVA seems to do the job for the things I want to achieve. Specifically, I assume that the network progresses in synchronous rounds so the absolute completion time of an EVA transfer doesn't matter for protocol correctness. But I might drop that assumption later and I still require that the EVA protocol is robust.

Having said that, the tradeoff here seems to be programming convenience vs. transfer speed. On the one hand, When speed really matters, I would go for libtorrent but that requires (much) more time to properly setup and manage. Additionally, a libtorrent download also has some ramp-up time. So when transferring many small files in a small period, EVA might be faster. On the other hand, I would recommend EVA if one needs to send small files quickly, quickly need to prototype a transfer and/or when speed is an less important factor.

drew2a commented 2 years ago

@rmadhwal

Though if we want to restrict EVA to smaller size transfers, our team could look at splitting the file up and sending it piecewise through EVA which would be easier on the memory.

Yes. Another option is to use a file instead of memory for storing transferred data. It looks like an easy change to do.