airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
14.73k stars 3.78k forks source link

[source-sftp-bulk] Sends Empty Data When the Last File Is Empty #34289

Open FilahAnas opened 5 months ago

FilahAnas commented 5 months ago

Connector Name

source-sftp-bulk

Connector Version

0.1.2

What step the error happened?

Updating the connector

Relevant information

For a stream, when the last file is empty, the connector doesn’t retrieve the schema of the table, even if the header exists in the file. This poses a problem because the connector sends empty data to the destination.

Example: We are using the SFTP bulk connector to retrieve data from CSV files in SFTP and load it into BigQuery. In this example, the last file contains only the header, the connector sends empty data to BigQuery, as illustrated below:

Screenshot 2024-01-16 at 16 09 52

Relevant log output

> python3.9 main.py discover --config secrets/config.json

{"type": "LOG", "log": {"level": "INFO", "message": "Found 99 files in \"/cross_channel/agent/agent log\""}}
{"type": "LOG", "log": {"level": "WARN", "message": "No records found in file {'filepath': '/cross_channel/agent/agent log/2023-12-15_agentlog_v2.csv', 'last_modified': datetime.datetime(2023, 12, 16, 0, 55, 15)}, can't infer json schema"}}
{"type": "CATALOG", "catalog": {"streams": [{"name": "agent_log", "json_schema": {"$schema": "http://json-schema.org/draft-07/schema#", "type": "object", "properties": {}}, "supported_sync_modes": ["full_refresh", "incremental"], "source_defined_cursor": true, "default_cursor_field": ["last_modified"]}]}}

Contribute

marcosmarxm commented 5 months ago

Thanks @FilahAnas the team will take a look in your contribution next sprint.