OpenMined / PySyft

Perform data science on data that remains in someone else's server
https://www.openmined.org/
Apache License 2.0
9.43k stars 1.99k forks source link

reduce communication bandwidth #2934

Closed luggi961 closed 4 years ago

luggi961 commented 4 years ago

I implemented a distributed image classification using WebsocketWorkers and TrainConfig objects. I started with the pretrained MobileNetV2 model (downloaded with torchvision) ~8MB While training, I inspected with WireShark how much traffic came across.

Result: ~18MB in both directions each training round

Assumption: The serialized message will be converted in a hexadecimal representation using the hexlify method (websocket_client.py). According to the documentation (https://docs.python.org/2/library/binascii.html) the resulting string is twice as long as the length of data. So I guess thats why the 8 MB model becomes 16 MB.

Possible solution: Transmitting only the difference between old and new model would be quite more efficient

iamtrask commented 4 years ago

Mad props for taking the time to measure this.

What leads you to believe that transmitting the difference between the old and new model would be any smaller of a file? Would it compress better?

karlhigley commented 4 years ago

If the differences were sparse, they'd be smaller. They often won't be though.

Gradient sparsification is a thing; maybe model delta sparsification could be too?

karlhigley commented 4 years ago

This seems like it might be more a matter of compression and encoding efficiency though? Base64 is supported by the binascii library and uses four characters for every three bytes (instead of two per byte), so that might be an easy change to make in order to improve bandwidth efficiency.

@luggi961 Would you be up for trying that out and submitting a PR?

vvmnnnkv commented 4 years ago

In theory, websocket can handle binary. As least in JavaScript :) The trick is to avoid JSON because it can't contain binary.

karlhigley commented 4 years ago

Can we just remove the call to hexlify then?

vvmnnnkv commented 4 years ago

Needs checking why it was used, maybe there was a reason. I can say for grid.js, historically it uses JSON payloads, which forces us to send binary as base64

karlhigley commented 4 years ago

Sounds like base64 is a safe change to make then. Let's start there and then see if we can further optimize.

vvmnnnkv commented 4 years ago

Another kind of workaround specific to transfer learning case could be splitting the model into frozen/trianable parts. In transfer learning you typically don't want to retrain all weights.

karlhigley commented 4 years ago

Looks like use of hexlify was introduced last March. There's no stated rationale in that PR; the title just suggests that the changes "fix bugs." Not sure if hexlify was part of those fixes or not.

github-actions[bot] commented 4 years ago

This issue has been marked stale because it has been open 30 days with no activity. Leave a comment or remove the stale label to unmark it. Otherwise, this will be closed in 7 days.