International-Data-Spaces-Association / DataspaceConnector

This is an IDS Connector reference implementation.
Apache License 2.0
32 stars 27 forks source link

Max filesize of artifact fetched via REST #133

Open SebastianOpriel opened 2 years ago

SebastianOpriel commented 2 years ago

Describe the bug There a two ways of fetching data from a REST API. Either one uses Routes, or one uses Artifact.accessUrl.

accessUrl

In latter case, the used library Okhttp v4.9.3 limits the max filesize to Int.MAX_VALUE (https://github.com/square/okhttp/blob/43bc338e8b80625655cd6c0a2ff547da43a14bef/okhttp/src/commonMain/kotlin/okhttp3/internal/-ResponseBodyCommon.kt#L38). Thus, the theoretically max file size is about 2,147 GB

To Reproduce Steps to reproduce the behavior:

  1. Create an Artifact with e.g. a 5GB external file (https://testfiledownload.com/)
  2. Access the /data URI of the created artifact to download the remote data
  3. An internal server error 500 is thrown
2022-03-31T14:21:00,684 [https-jsse-nio-8080-exec-5] WARN - Could not connect to data source. [exception=(Cannot buffer entire body for content length: 10485760000)]
java.io.IOException: Cannot buffer entire body for content length: 10485760000
at okhttp3.ResponseBody.bytes(ResponseBody.kt:324) ~[okhttp-4.9.3.jar:?]

Camel Routes

If camel routes are used (https://international-data-spaces-association.github.io/DataspaceConnector/CommunicationGuide/v6/Camel#step-2-create-a-generic-endpoint), a 5GB file is successfully transferred. One may just need to make sure that reverse proxies have a sufficient timeout, when accessing the data via /data URL, due to the connector first downloading the file and then forwarding it to the requestor. If a 10GB file is used following exception is thrown (but as network monitor shows, download is started and stopped after some seconds):

2022-03-31T15:18:43,161 [https-jsse-nio-8080-exec-2] WARN - Caught an exception during route execution. [error=(RouteError(routeId=e463b5a0-5f28-45a6-9660-56b41813ad97, endpoint=https://speed.hetzner.de/10GB.bin, message=Exception occurred during execution on the exchange: Exchange[], timestamp=2022-03-31T15:18:43.161703663))]
2022-03-31T15:18:43,162 [https-jsse-nio-8080-exec-2] WARN - Could not connect to data source. [exception=(Failed to retrieve data.Exception occurred during execution on the exchange: Exchange[])]
io.dataspaceconnector.common.exception.DataRetrievalException: Failed to retrieve data.Exception occurred during execution on the exchange: Exchange[]
    at io.dataspaceconnector.common.routing.RouteDataRetriever.get(RouteDataRetriever.java:86) ~[classes/:7.0.2]
SebastianOpriel commented 2 years ago

The used transfer mechanisms in DSC limit the file size to be sent.