Closed luggi961 closed 4 years ago
Mad props for taking the time to measure this.
What leads you to believe that transmitting the difference between the old and new model would be any smaller of a file? Would it compress better?
If the differences were sparse, they'd be smaller. They often won't be though.
Gradient sparsification is a thing; maybe model delta sparsification could be too?
This seems like it might be more a matter of compression and encoding efficiency though? Base64 is supported by the binascii
library and uses four characters for every three bytes (instead of two per byte), so that might be an easy change to make in order to improve bandwidth efficiency.
@luggi961 Would you be up for trying that out and submitting a PR?
In theory, websocket can handle binary. As least in JavaScript :) The trick is to avoid JSON because it can't contain binary.
Can we just remove the call to hexlify
then?
Needs checking why it was used, maybe there was a reason. I can say for grid.js, historically it uses JSON payloads, which forces us to send binary as base64
Sounds like base64 is a safe change to make then. Let's start there and then see if we can further optimize.
Another kind of workaround specific to transfer learning case could be splitting the model into frozen/trianable parts. In transfer learning you typically don't want to retrain all weights.
Looks like use of hexlify
was introduced last March. There's no stated rationale in that PR; the title just suggests that the changes "fix bugs." Not sure if hexlify
was part of those fixes or not.
This issue has been marked stale because it has been open 30 days with no activity. Leave a comment or remove the stale
label to unmark it. Otherwise, this will be closed in 7 days.
I implemented a distributed image classification using WebsocketWorkers and TrainConfig objects. I started with the pretrained MobileNetV2 model (downloaded with torchvision) ~8MB While training, I inspected with WireShark how much traffic came across.
Result: ~18MB in both directions each training round
Assumption: The serialized message will be converted in a hexadecimal representation using the hexlify method (websocket_client.py). According to the documentation (https://docs.python.org/2/library/binascii.html) the resulting string is twice as long as the length of data. So I guess thats why the 8 MB model becomes 16 MB.
Possible solution: Transmitting only the difference between old and new model would be quite more efficient