Lilypad-Tech / lilypad

run AI workloads easily in a decentralized GPU network. https://www.youtube.com/watch?v=zeG2F-JANjI
https://lilypad.tech
Apache License 2.0
29 stars 3 forks source link

[core] local IPFS node? #83

Open AquiGorka opened 2 weeks ago

AquiGorka commented 2 weeks ago

Is this the reason why jobs take too long

arsen3d commented 2 weeks ago

Image

Yes. It's about the way "bacalhau get" attempts to get the file. Workaround is to use "ipfs get" which works in under 7 seconds.

There maybe a way to fix "bacalhau get", but I was not able to get to the bottom of it.

AquiGorka commented 2 weeks ago

Wait, wait @arsen3d the end goal for this task is to find the ideal solution or if it cannot be done, document what is going on until us or the bacalhau team can provide the long term fix. Reopening.

Zorlin commented 2 weeks ago

Re: this task:

I believe merging this "hack" (using IPFS get locally instead of bacalhau get) will be safe enough and reliable enough that we can rely on it during the next few weeks. We should also contact the Bacalhau team and open an issue that describes in detail the slowdown we're experiencing with bacalhau get and get a more permanent fix, as that should "always" be the faster option.

However, merging it as is seems prudent to me. It's a one line fix we can revert if it causes issues, will significantly improve the speed and the perception of the speed of Lilypad, and my concerns about propagation issues are irrelevant as we don't actually care if the IPFS CID is retrievable outside of the RP - the RP is literally going to send results to us over a websocket anyways, so firewall/NAT holepunching doesn't much matter.

Zorlin commented 2 weeks ago

Image

Yes. It's about the way "bacalhau get" attempts to get the file. Workaround is to use "ipfs get" which works in under 7 seconds.

There maybe a way to fix "bacalhau get", but I was not able to get to the bottom of it.

Can you please show me how to reproduce your testing here and how to get your UI work going so I may test this patch? I believe I can improve the speed further with IPFS peering/preload, at which point we should see < 2 second retrieval and can consider this solved until we get a more permanent fix from the Bacalhau team.