dariusc93 / rust-ipfs

The InterPlanetary File System (IPFS), implemented in Rust.
Apache License 2.0
36 stars 6 forks source link

Issues with Storing and Accessing Large Data (>12-13 Bytes) via Public Gateway #164

Open Ali-Usama opened 3 months ago

Ali-Usama commented 3 months ago

Description

I'm integrating rust-ipfs into a Substrate blockchain to enable decentralized storage capabilities for our nodes. The integration involves using offchain workers to interact with an IPFS node, managed by rust-ipfs, for storing and retrieving data. While testing this setup, I've encountered an issue where I'm unable to access data larger than approximately 12 to 13 bytes through a public IPFS gateway. Smaller data sizes work as expected and are accessible without issues.

Steps to Reproduce

Expected Behavior

Data of any size, when stored on IPFS using rust-ipfs through our Substrate blockchain integration, should be retrievable via public IPFS gateways.

Actual Behavior

When attempting to access data larger than 12 to 13 bytes through a public gateway, the request fails (504: Gateway Timeout Error). Smaller data sizes are retrievable without any issues.

Additional Information

Rust-IPFS version: forked rust-ipfs Substrate version: polkadot-v0.9.43

I suspect this might be related to how rust-ipfs handles data chunking or broadcasting of CID announcements to the IPFS network, particularly for larger data sizes. However, I am not entirely sure if the issue lies within the configuration of the rust-ipfs node, the data storage process, or the retrieval/query mechanism.

Request for Assistance

Could you provide insights or recommendations on how to address this issue? Specifically, I am looking for:

Thank you for your support and looking forward to your guidance on resolving this challenge.

dariusc93 commented 3 months ago

Hey! Thank you for the report. I never done much testing with rust-ipfs and public gateways lately (since that havent been a priority for me at the moment), but the last I did test I do know it sometimes falls down to connectivity, if the content is being provided on DHT as well as the bitswap implementation (which under your fork uses beetle-bitswap by default, which should work better when dealing with gateways). I didnt have time to do a full review of the code youre using (can do that later on today), but from a quick skim there are some things I can suggestions to see if it helps any:

1) Use the latest of rust-ipfs (though 0.11 havent been published yet, I have done some optimizations and updates, though in your case I would suggest using beetle-bitswap feature) 2) Check your firewall to make sure it does not block upnp and that your machine and network equipment supports it; or connect and listen in on a public relay so your local node can be dialable. 3) Though the bitswap implementation used will send an event to provide the cid over DHT, you can also manually provide those blocks too. 4) On https://github.com/Ali-Usama/substrate/blob/polkadot-v0.9.43/client/offchain/src/api/ipfs.rs#L31C9-L31C73, I would advise decreasing this amount below 2MB (preferable to be 1MB or leave it at a default of 256k). This is because bitswap specs calls for messages not to exceed 2MB while ipfs suggest blocks not to be no more than 1MB, so if the block exceeds 2MB (including the size of the protobuf message) it may fail and no blocks would be exchanged.

Ali-Usama commented 3 months ago

I've updated the node configurations here using the bitswap feature, but still I'm facing the same issue. After adding a boot node, this still returns 0 peers:

let peers = if let IpfsResponse::Peers(peers) = ipfs_request::<T>(IpfsRequest::Peers)? {
                peers
            } else {
                Vec::new()
            };

So, I'm assuming the issue might be with how the node connects to the IPFS network, but it still doesn't explain why the small datasets are accessible on the public gateways, and as soon as the data size crosses a certain threshold, it becomes inaccessible.

dariusc93 commented 3 months ago

Thank you for your response.

After adding a boot node, this still returns 0 peers

Could you add the other bootstrap nodes and maybe try calling Ipfs::bootstrap after initializing and see if that helps? We dont do this automatically (although the latest rust-libp2p version will likely do this automatically) so all it would do is add to the peer kbucket and connect but would not begin bootstrapping.

So, I'm assuming the issue might be with how the node connects to the IPFS network, but it still doesn't explain why the small datasets are accessible on the public gateways, and as soon as the data size crosses a certain threshold, it becomes inaccessible.

I do find it interesting that is a problem after a specific amount of data. Would it also be an issue If you were to run a local gateway and have your instance connect to that gateway instead? Are you connecting over any relays and if so, does it use dcutr properly? (you would likely have to look at the logs for this AND this is assuming that upnp isnt working or isnt a option in your environment - best to check firewall and network equipment in that case to be sure). If it doesnt, the small amount of data might make sense because relay v2 defaults to about 128k of data before the connection resets since it expects dcutr to kick in by that time if both peers support the protocol and there isnt any issues preventing usage of that protocol.

dariusc93 commented 1 month ago

Did a little testing and there is only some instances where I've noticed that there arent as many responses to a gateway. but not specific to any specific amount of data.