Closed barsuna closed 2 months ago
Hey, the download part makes sense - many downloads in parallel is slow (easy fix) - but I don't understand what you mean about file read performance? Is that a different issue?
Check #138 and see if that fixes it?
Hey, the download part makes sense - many downloads in parallel is slow (easy fix) - but I don't understand what you mean about file read performance? Is that a different issue?
@AlexCheema, thank you for quick follow-up - indeed this fixed the issue.
The file read performance point is the following: when we download many files in parallel - these files end up fragmented on a file system (i.e. they cannot be read without extra seek operations, which visibly reduces read/write performance on spinning disks (less so on SSDs). Different filesystems will have different impact, in my case it was NTFS which turned out to suffer quite badly).
Net result - if at ~100MB/sec it takes say 700 seconds to load 50% shard (70GB) of Llama 3.1 - at 5MB/sec it takes 20x times that - not really practical anymore
i know, i know who uses spinning disks to keep models etc... but imo we should at least be able to get performance of underlying media. Thank you again for enhancing this quickly.
Currently a device in a cluster tries to download its shards in parallel. For large shards/models this sometimes results in ~15 download threads (~i.e. llama 3.1 70B is 30 large safetensor files - if device gets half of that it gets to 15 simulataneous downloads). This leads to an issue where if the underlying media hosting ~.cache/huggingface/hub is non-ssd the files end up laid on media with very high fragmentation - very low read performance.
(for example on my system - normal file is read at about ~130MB/sec, but files downloaded by exo are read at about 4.5MB/sec!!! /until defragmented/)
It might be sensible idea to provide a option/knob to download shards in series (within 1 device) - there is no advantage to parallel download in many cases (while there is certainly advantage to download in parallel between devices)