UCLA-IRL / ndn-python-repo

An NDN Repo implementation in Python
Apache License 2.0
16 stars 13 forks source link

High overhead by command checking #59

Closed phylib closed 3 years ago

phylib commented 3 years ago

I just realized that the current PutFile implementation has a rather high overhead by the command checking message exchange that is continuing for quite a long time after file insertion. Is it possible to stop the command checking traffic after successful insertion?

Let me share some details: I inserted a file with exactly 1000 bytes of content (a few bytes overhead for the file name are ignored here). I checked the nfdc status report before uploading, directly after uploading, and after the pub/sub messages for command checking stopped and calculated the bytes that were exchanged.

Directly after Upload:

Face 275 (Repo): In = 1483; Out = 2392 Face 276 (PutFile): In = 2514; Out = 1681

2392 bytes sent. That's the NDN overhead I would expect (roughly a factor of 2.5). 1000 bytes of payload, + name, Interests-Data exchange for the Basic Repo Insertion Protocol.

After the command checking message exchange stopped, I calculated the following:

Face 275 (Repo): In = 8251; Out = 6575 Face 276 (PutFile): In = 6697; Out = 8449

When inserting a file with 1000 bytes, the PutFile client has almost 7.000 bytes of downlink, and 8.500 bytes of uplink traffic, which is quite a high overhead.

Pesa commented 3 years ago

Is this the same issue as #46 ?

phylib commented 3 years ago

I can't say for sure, but it is likely to be related.

phylib commented 3 years ago

I did some code reading and got some questions:

In the Repo Insertion Protocol, we have the status command/reply in the end. The intention behind that is that the client should be able to check whether the upload was successful or not. The recommendation is to check that about three times.

When looking at the implementation, I saw that the checking protocol differs from specification in the link above. Instead of the client fetching the command status, the Repo notifies the client that a new command status is available. The client listens to such notification and retrieves the command status from the Repo. And this command status notification goes on for 60 seconds, thereafter, the Repo stops sending notifications.

I am not quite sure why the implementation is like that. Maybe there is an updated specification which I am not aware of. Currently, instead of the client asking about the upload status, the Repo pushes the status to the client. I guess this leads to the overhead.

Is there some reason from diverging from the Basic Repo Insertion Protocol?

jefft0 commented 3 years ago

Philipp is right about the extra traffic after the upload is complete to check if it was successful. In addition to this, there is an extra possibly unnecessary exchange before upload starts. Here are the first 5 packets at the repo between the repo and the client node_a:

  1. recv Interest /testrepo/insert/notify/...
  2. sent Interest /node_a/msg/testrepo/insert/%90E%FF3
  3. recv Data /node_a/msg/testrepo/insert/%90E%FF3
  4. sent Data /testrepo/insert/notify/...
  5. sent Interest /node_a//Users/jefft0/temp/logs/a.txt/1612788834979/seg=0

Messages 1, 4 and 5 make sense to me. These are the interest from the client node to start the insert, and the Data response, plus the interest to the client node to fetch the uploaded file. But what are messages 2 and 3? Are they necessary?

phylib commented 3 years ago

@jefft0, @JonnyKong just pushed a few commits that should resolve the issue. The insert process now again follows the original "Basic Repo Insertion Protocol".

I just tried uploading a file with 63.696B, and nfdc showed the following output for the Repo face: faceid=263 remote=fd://39 local=unix:///run/nfd.sock congestion={base-marking-interval=100ms default-threshold=65536B} mtu=8800 counters={in={38i 3d 0n 4230B} out={3i 38d 0n 73832B}} flags={local on-demand point-to-point congestion-marking}

Overhead seems to be 10kB, which is about 16% of the original file size.

Can you double check that the overhead is gone and confirm?

jefft0 commented 3 years ago

I get similar results as you when I upload a file with 63.696 bytes. Here is the NDN traffic I see from the uploader node:

sent Interest /testrepo/insert/notify/params-sha256=...
recv Interest /node_a/msg/testrepo/insert/%BEM%AB%D5
sent Data /node_a/msg/testrepo/insert/%BEM%AB%D5
recv Data /testrepo/insert/notify/params-sha256=...

sent Interest /testrepo/insert%20check/%CE%048%09%F2%F0
recv Data /testrepo/insert%20check/%CE%048%09%F2%F0

recv Interest /node_a//Users/jefft0/temp/logs/1.txt/1614276907951/seg=0
sent Data /node_a//Users/jefft0/temp/logs/1.txt/1614276907951/seg=0

sent Interest /testrepo/insert%20check/%CE%048%09%F2%F0
recv Data /testrepo/insert%20check/%CE%048%09%F2%F0

Before the uploader receives the interest /node_a//Users/jefft0/temp/logs/1.txt... to fetch the file, I expect to see the Interest/Data exchange to start the upload:

sent Interest /testrepo/insert/notify/params-sha256=...
recv Data /testrepo/insert/notify/params-sha256=...

But there are two more Interest/Data exchanges that I don't expect to see:

recv Interest /node_a/msg/testrepo/insert/%BEM%AB%D5
sent Data /node_a/msg/testrepo/insert/%BEM%AB%D5
sent Interest /testrepo/insert%20check/%CE%048%09%F2%F0
recv Data /testrepo/insert%20check/%CE%048%09%F2%F0

Can these be removed?

phylib commented 3 years ago

The initial Insert-Interest does not contain the InsertParams. The params are retrieved with the /node_a/msg/testrepo/insert/%BEM%AB%D5 Interest. So, this Interest-Data exchange can not be removed.

The next "unexpected" Interest is status checking. The client regularly retrieves the status of the current insert process. Theoretically, this could be delayed for a couple of hundred milliseconds, but I would not see this Interest-Data exchange as critical in terms of overhead.

jefft0 commented 3 years ago

On the web meeting, we discussed the reason for the initial "extra" Interest/Data exchange. It is to avoid using a signed interest to initiate the upload. That resolves my remaining question. You can close the issue.