exasol / cloud-storage-extension

Exasol Cloud Storage Extension for accessing formatted data Avro, Orc and Parquet, on public cloud storage systems
MIT License
7 stars 11 forks source link

Exasol is too fast for Google Cloud Storage #192

Closed kevinerhardt closed 2 years ago

kevinerhardt commented 2 years ago

Hi,

we wanted to export a few million rows to google cloud storage with DW_CLOUD_STORAGE_EXTENSION.EXPORT_PATH . Not all data was received at GCS. After some research, we increased the EXPORT_BATCH_SIZE to 1 000 000 rows instead of the default value of 100 000. This fixed our issue. It seems like GCS can not handle so many requests in the speed Exasol produces them.

Is there a way of throwing an exception or warning when google cloud storage can not process that much data?

Thanks for your effort Cheers Kevin

morazow commented 2 years ago

Hello @kevinerhardt,

Thanks for the feedback!

Do you know if it is because of throttling? Did you find why the records were missing?

We could in general increase the batch size for GCS, but it would help to know if there are any limitations.

Best, Muhammet

redcatbear commented 2 years ago

@morazow, please investigate if we need configurable client-side rate-limiting.

kevinerhardt commented 2 years ago

Hi Muhammet,

sorry for the late answer. We were using the default configuration on GCS side. Upgrading the GCS license would have handled the error as well. Logs were not configured at that time on GCS side. The main issue was basically that Exasol told me, Millions rows were affected but were not received on GCS. And since different teams are working on each side this error was noticed at a later stage of our project. I don’t know if you can do anything about the Exasol output. Maybe increasing the default batch size for GCS export is an option.

Cheers Kevin

Von: Muhammet Orazov @.> Gesendet: Mittwoch, 2. März 2022 17:31 An: exasol/cloud-storage-extension @.> Cc: Erhardt, Kevin @.>; Mention @.> Betreff: [SPAM] Re: [exasol/cloud-storage-extension] Exasol is too fast for Google Cloud Storage (Issue #192) Priorität: Niedrig

Hello @kevinerhardthttps://github.com/kevinerhardt,

Thanks for the feedback!

Do you know if it is because of throttling? Did you find why the records were missing?

We could in general increase the batch size for GCS, but it would help to know if there are any limitations.

Best, Muhammet

— Reply to this email directly, view it on GitHubhttps://github.com/exasol/cloud-storage-extension/issues/192#issuecomment-1057122798, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AYA5SXY5EAQTA3JDYMABMQDU56JUVANCNFSM5PXYRUIA. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you were mentioned.Message ID: @.**@.>>

morazow commented 2 years ago

Hello @kevinerhardt,

Thanks a lot for the feedback! For now, I am going to close this issue.

I read about the fs.gs.outputstream.type in the GCS configuration documentation. Setting to one of the *_COMPOSITE types should apparently help. That could also be an option we can consider together with increased batch size. We can look into it again if we see similar issue again.