Closed cobbinma closed 4 months ago
Hi, thank you for reporting this! To help us reproduce the issue more reliably without guessing, could you also provide the code calling async fn select<T: DeserializeOwned + Send + Sync>(...) {...}
(more specifically we'd like to see the definition of a type that's passed to T
)?
@ysaito1001
of course 👍 (thanks for looking into this)
please let me know if you need anything else to replicate
#[derive(Default, Debug, Clone, Eq, PartialEq, Serialize, Deserialize)]
#[serde(rename_all = "camelCase")]
pub struct RedemptionRecord {
pub perk_id: String,
pub user_id: String,
pub deal_type: String,
pub savings_percentage: Option<String>,
pub savings_gbp: Option<String>,
pub savings_aud: Option<String>,
pub created_at: String,
pub name: String,
pub brand_name: String,
pub header: String,
pub estimated_savings_gbp: Option<String>,
pub estimated_savings_aud: Option<String>,
pub internal_identifier: Option<String>,
pub local_currency: Option<String>,
pub savings_local_currency: Option<String>,
pub estimated_savings_local_currency: Option<String>,
}
self.select::<RedemptionRecord>(key, query, SelectFormat::new(SelectType::CsvToJson, "\""))
.await
Thank you for providing additional information. Just realized that we also need to know what gets passed to .input_serialization(select_format.input)
and .output_serialization(select_format.output)
out of select_format
. If you could provide that, that'd be appreciated!
@ysaito1001 sure 👍
for the file we are using a double quote quote character
let input_selection = match input {
SelectType::CsvToJson => InputSerialization::builder()
.csv(
CsvInput::builder()
.file_header_info(FileHeaderInfo::Use)
.allow_quoted_record_delimiter(true)
.quote_character(quote_char.clone())
.build(),
)
.compression_type(CompressionType::None)
.build(),
};
let output_selection = match input {
SelectType::CsvToJson => OutputSerialization::builder()
.json(JsonOutput::builder().build())
.build(),
};
Thank you for providing additional pieces of information. I have reproduced the issue you're observing but suspect that the data stored in output.csv
may contain incomplete UTF-8 byte sequence.
I do not receive errors when running in the AWS console so I believe it is related to the rust SDK.
When I ran this query SELECT * FROM S3Object s WHERE createdAt >= '2023-04-24';
within the AWS console (selecting Exclude the first line of CSV data
), I retrieved 8651 records:
When I modified the reproducer so it used String::from_utf8_lossy
instead of std::str::from_utf8
// in async fn select_records
....
if let SelectObjectContentEventStream::Records(records) = event {
let records = records
.payload
.as_ref()
.map(|p| String::from_utf8_lossy(p.as_ref()))
.ok_or_else(|| anyhow!("unable to parse payload"))?; // I omitted `ReportingError::Unexpected` because I didn't have it
for line in records.lines() {
...
the reproducer returned 8651 records without deserialization errors.
My guess is that executing S3 select within the AWS management console is more lenient in terms of handling incomplete UTF-8 byte sequence.
@ysaito1001 Thanks so much for looking into this 👏
Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.
Describe the bug
When using S3 Select we receive occasional errors where we are unable to deserialize a JSON record.
These errors are from
serde
and are either 'missing field' or 'duplicate field' errors.The JSON that produces these errors seem as though they are formed from two separate CSV rows. Could this be caused by partial records being received in an incorrect order?
Expected Behavior
S3 Select should produce valid JSON that we are able to deserialize.
Current Behavior
Reproduction Steps
SELECT * FROM S3Object s WHERE createdAt >= '2023-04-24';
output.csv
Possible Solution
No response
Additional Information/Context
We are using code very similar to the S3 Select Example
https://github.com/awslabs/aws-sdk-rust/blob/main/examples/examples/s3/src/bin/select-object-content.rs
I do not receive errors when running in the AWS console so I believe it is related to the rust SDK.
Version
Environment details (OS name and version, etc.)
cargo 1.76.0 (c84b36747 2024-01-18)
Logs
No response