Closed minbaev closed 6 years ago
@minbaev thank you for reporting this. If i understand correctly then it fails on random basis, sometimes working and sometimes not, right? Do you know what Stocator version you are using?
From the exception it looks like it comes from Amazon SDK that received data of length it didn't expected. I suggest you to look over web as well and see if users reported similar issues with Amazon SDK. Did you tried the same with Stocator branch based on COS SDK? I wonder if the same issue exists there
@gilv thank you for your response. the basis for failure seems random, that's right. As we are running spark jobs on the Analytics Engine, We are not providing Stocator jar directly, we are using the version IBM Analytics Engine has configured.
The actual data length is always smaller than expected so I tend to believe it might be that input stream
Researching online suggests to specify the file I want to read instead of having input file stream, so we don't have inconsistency in expected and actual data length.
But I'm not sure how this can be applied to our use case (with pyspark Dataframe API)
We have data stored on COS serialized into parquet. We are reading it hourly using the following method:
From times to times it is throwing the following error
Could you please have a look at the possible solution for the issue? Thank you