PlatformLab / HomaModule

A Linux kernel module that implements the Homa transport protocol.
175 stars 43 forks source link

The message length problem in socket #4

Closed LFRPINK closed 2 years ago

LFRPINK commented 2 years ago

We build and run the HomaModule successfully. However, there is a sending error when we enlarge the message length (exceed MTU) by modifying the MSGLEN. The return of homa_send() is -1 and errno is 14. Is there any wrong with the way we set the message length?

WechatIMG715

L

johnousterhout commented 2 years ago

There is an upper limit on message length, defined by the constant HOMA_MAX_MESSAGE_LENGTH in homa.h. Is it possible that you have exceeded that length?

-John Ousterhout-

On Tue, Nov 9, 2021 at 9:51 PM LFRPINK @.***> wrote:

We build and run the HomaModule successfully. However, there is a sending error when we enlarge the message length (exceed MTU) by modifying the MSGLEN. The return of homa_send() is -1 and errno is 14. Is there any wrong with the way we set the message length? [image: WechatIMG715] https://user-images.githubusercontent.com/45142392/141056617-43a8223a-6475-46cd-939e-73f60d96c556.png L

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/PlatformLab/HomaModule/issues/4, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACOOUCX7OO4CVWCNNP7HZVDULIB67ANCNFSM5HXAYXWA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

LFRPINK commented 2 years ago

The MTU set as default (1500 bytes) and we set the MSGLEN 1600 bytes (exceed the MTU slightly, but do not reach the HOMA_MAX_MESSAGE_LENGTH). Is there anything else that could have caused this error? Thank you very much!

johnousterhout commented 2 years ago

I just noticed that the errno you're getting is 14, which is EFAULT. This happens when Homa can't read or write memory areas you have passed in (msg and addr in this case). I'd suggest checking to make sure that both are accessible. If this still doesn't identify the problem, I'd suggest varying MSGLEN to figure out exactly which sizes work and which ones don't; that may give you some clues. If even that doesn't work, how about creating the simplest program you can that causes the problem and send me the code? I'll try it out here to see if it happens for me also (and if so, I can figure what's going on).

-John-

On Tue, Nov 9, 2021 at 11:59 PM LFRPINK @.***> wrote:

The MTU set as default (1500 bytes) and we set the MSGLEN 1600 bytes (exceed the MTU slightly, but do not reach the HOMA_MAX_MESSAGE_LENGTH). Is there anything else that could have caused this error? Thank you very much!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/PlatformLab/HomaModule/issues/4#issuecomment-964871016, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACOOUCXOF5EB2UTPQB3JDCDULIQ4ZANCNFSM5HXAYXWA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

LFRPINK commented 2 years ago

Many thanks for your kind and warm help. We have tried the methods you mentioned above. However, we still can not figure out the problem. We try some size (larger than MTU such as 1550 bytes, 1600 bytes ...) for MSGLEN and the function homa_send() returns successfully. But WireShark can not capture any request packets on the client's NIC. We deploy HomaModule in two hosts which connect directly each other and the capacity of the link is 1Gb/s. The details of deployment are shown in our HomaTest. Would you mind check if there are any errors in our socket programs?

By the way, we still have some questions on HomaModule after reading the related work (ATC21).

1.Are the grants sent to clients automatically when servers receive a packet? (Is the grant module transparent to programmers?) 2.Is if the RTTbytes initiated as a constant or variable accroding to different network topology?

johnousterhout commented 2 years ago

I downloaded your Git repo, and it works fine for me (the server prints "SERVER: 10000").

It sounds like the problem is related to packet size, and I'm wondering if it might have to do with TSO/GSO (perhaps your system doesn't support those but somehow Homa decides it does, or perhaps the way Homa "tricks" NICs into using TSO on Homa packets isn't working with your NICs). Try typing the following command on your client machine: "sudo sysctl .net.homa.max_gso_size=1500". This should disable Homa's use of GSO. Let me know if this gets things working?

In response to your questions:

Are the grants sent to clients automatically when servers receive a packet?

(Is the grant module transparent to programmers?)

Grants are sent automatically by Homa; programmers should not need to think about these.

Is if the RTTbytes initiated as a constant or variable accroding to

different network topology

This is a value that should be set locally based on your network speed and topology. You can set it with the command "sudo sysctl .net.homa.rtt_bytes=XXX". It should be set to the number of bytes that can be transmitted in the time it takes for a small message round-trip.

-John-

On Tue, Nov 16, 2021 at 2:38 AM LFRPINK @.***> wrote:

Many thanks for your kind and warm help. We have tried the methods you mentioned above. However, we still can not figure out the problem. We try some size (larger than MTU such as 1550 bytes, 1600 bytes ...) for MSGLEN and the function homa_send() returns successfully. But WireShark can not capture any request packets on the client's NIC. We deploy HomaModule in two hosts which connect directly each other and the capacity of the link is 1Gb/s. The details of deployment are shown in our HomaTest https://github.com/linxone/homaTest. Would you mind check if there are any errors in our socket programs?

By the way, we still have some questions on HomaModule after reading the related work (ATC21).

  1. Are the grants sent to clients automatically when servers receive a packet? (Is the grant module transparent to programmers?) 2)Is if the RTTbytes initiated as a constant or variable accroding to different network topology?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/PlatformLab/HomaModule/issues/4#issuecomment-970140225, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACOOUCTYIF3SO7LUNMJDB7DUMIYBDANCNFSM5HXAYXWA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

LFRPINK commented 2 years ago

Thank you for your prompt reply. It does work when we disable the GSO with the command line you apply. Currently, the client socket can send a message larger than MTU.

For the purpose of using up the bandwidth, we tend to send a long message (larger than the 10KB RTTbytes, such as 20 kbytes, 30 kbytes or HOMA_MAX_MESSAGE_LENGTH). We find that the server can not receive the entire message intermittently. For example, we try to send a 20000 bytes message and trace the packets in NIC of client with WireShark. And the result is shown as follow.

Pasted Graphic 1

From the figure, we can observe that client sends a part of message (unscheduled packets) and gets some grants from the server successfully. However, the client finds the destination unreachable so that remaining packets of the meassge can not be transmitted next.

Are there some problems in our method to occupy all bandwidth with HomaModule? Could you give some suggestion for us? Thank you very much!

johnousterhout commented 2 years ago

It's hard for me to tell what's going on from information in your message, but Homa comes with a bunch of tools for analyzing its behavior and performance. In the subdirectory "util" there is a Python script ttprint.py, which will extract and print detailed time traces of exactly what is going on inside Homa. Can you run your experiment again, then run ttprint.py on both the client and server machine, save the output in files, and email me the client and server time traces? Given these, I should be able to get a sense of what's going on. You might try taking a look at them yourself as well (I don't know if they will make sense to anyone but me, but it might be worth a try).

-John-

On Thu, Nov 18, 2021 at 5:17 AM LFRPINK @.***> wrote:

Thank you for your prompt reply. It does work when we disable the GSO with the command line you apply. Currently, the client socket can send a message larger than MTU.

For the purpose of using up the bandwidth, we tend to send a long message (larger than the 10KB RTTbytes, such as 20 kbytes, 30 kbytes or HOMA_MAX_MESSAGE_LENGTH). We find that the server can not receive the entire message intermittently. For example, we try to send a 20000 bytes message and trace the packets in NIC of client with WireShark. And the result is shown as follow. [image: Pasted Graphic 1] https://user-images.githubusercontent.com/45142392/142413256-00dc8661-e310-47dc-9ffa-33d0ac46c587.png

From the figure, we can observe that client sends a part of message (unscheduled packets) and gets some grants from the server successfully. However, the client finds the destination unreachable so that remaining packets of the meassge can not be transmitted next.

Are there some problems in our method to occupy all bandwidth with HomaModule? Could you give some suggestion for us? Thank you very much!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/PlatformLab/HomaModule/issues/4#issuecomment-972857373, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACOOUCTYX7OIXAZ2M5FIBYDUMT4IBANCNFSM5HXAYXWA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

LFRPINK commented 2 years ago

Thank you for your help sincerely! It does work with the methods you supply in e-mail. Currently, the messages can be transmitted successfully.