Open NipunaRanasinghe opened 6 months ago
@NipunaRanasinghe did we think about the performance aspect of this. Lets say we have a large file (may be a JSON or a CSV) and we are trying to get it and data bind it to JSON (or CSV) what would be the behaviour ?
Better to check the behaviour of Ballerina IO package when we try to read a large file.
@ayeshLK Your concerns regarding large files is certainly valid. However, it's important to note that the proposed approach includes both byte array streams and, direct data types such as JSON, XML, etc as the expected type. Therefore this approach gives users the flexibility to make a conscious decision based on their application needs. (If they are dealing with relatively small files, direct data binding would be convenient and appropriate. Conversely, for larger files, they could opt for streaming to manage resource usage more effectively).
Nevertheless, we can do a performance comparison between the two approaches after implementing the suggested changes. This will help us ensure that the new functionality can be used efficiently across different scenarios.
@NipunaRanasinghe Shall we have a separate API for streaming like in IO API[1]? Ask the user to specify the block size and read it as a stream.
public isolated function getBlockAsStream(string path, int blockSize) returns stream<Block, Error?>;
# The read-only byte array that is used to read the byte content from the streams.
public type Block readonly & byte[];
What would the usage look like to read CSV files from the FTP server? CSV file read/write is a common usecase.
@NipunaRanasinghe Shall we have a separate API for streaming like in IO API[1]? Ask the user to specify the block size and read it as a stream.
@daneshk here the stream capability is kept under the same API, mainly to the keep the backward compatibility (so that the existing code will not break and users can optionally switch to infer the types, only if they want).
If the backward compatibility is not a priority, I'm +1 to move the streaming capability into a new API.
What would the usage look like to read CSV files from the FTP server? CSV file read/write is a common usecase.
Here the CSV support is not prioratised (as mentioned in non-goals), as Ballerina still lack of first-class support for CSV data types (unlike json and xml).
Therefore one option would be to go for a cleaner implementation once we we have a better support for CSV data type via the data.csvdata
library(ETA - Update10).
But if this should be prioritized, I'm +1 to consider the CSV support (with the representation of string[][]
) with the same effort and, to revisit once after we have proper CSV support.
When considering about the return types json|xml|byte[]
, we will support subtypes of json
as well right?
i.e a file containing a json
array will map to Person[]
.
As per the design discussion with @shafreenAnfar, @daneshk and @dilanSachi, we've decided to,
get()
with inferred types). put()
method.Please find the new proposal below, which includes the additional requirements mentioned above.
The current implementation of the get()
method in the Ballerina FTP client allows fetching and uploading file contents only as streams of bytes. This proposal suggests introducing a set of new APIs to support direct data binding to common data formats such as text, JSON, XML, byte arrays and CSV, reducing the need for manual conversions and stream handling.
get()
and put()
methods that return and upload file contents as streams of bytes. The existing methods will remain for backward compatibility and, expected to be removed in a future release.Currently, fetching, converting, and uploading file contents using the FTP connector involves multiple steps and explicit stream handling, which can be cumbersome. By enabling direct data binding through specific APIs, we aim to provide a more streamlined and efficient way to handle file contents.
The proposed changes will introduce new methods in the FTP connector, each tailored to fetch and bind file contents to specific data types directly. Similarly, new methods for uploading contents directly as various data types will be introduced.
get()
methodFetch content as plain text
# Fetches file content from the FTP server as plain text.
# + path - The path to the file on the FTP server
# + return - The file content as JSON or, an `ftp:Error`
remote isolated function getText(string path) returns string|Error;
Fetch content as JSON
# Fetches file content from the FTP server as JSON.
# + path - The path to the file on the FTP server
# + return - The file content as JSON or, an `ftp:Error`
remote isolated function getJson(string path) returns json|Error;
Fetch content as XML
# Fetches file content from the FTP server as XML.
# + path - The path to the file on the FTP server
# + return - The file content as XML or, an `ftp:Error`
remote isolated function getXml(string path) returns xml|Error;
Fetch content as a byte array
# Fetches file content from the FTP server as a byte array.
# + path - The path to the file on the FTP server
# + return - The file content as a byte array or, an `ftp:Error`
remote isolated function getBytes(string path) returns byte[]|Error;
Fetch content as a stream of byte arrays
# Fetches file content from the FTP server as a stream of byte arrays.
# + path - The path to the file on the FTP server
# + blockSize - size of a single block to be fetched
# + return - The file content as a stream of byte arrays or, an `ftp:Error`
remote isolated function getBytesAsStream(string path, int blockSize) returns stream<readonly & byte[], Error?>|Error;
Fetch content as CSV
# Fetches file content from the FTP server as CSV.
# + path - The path to the file on the FTP server
# + return - The file content as a 2D string array (CSV) or, an `ftp:Error`
remote isolated function getCsv(string path) returns string[][]|Error;
Fetch content as a CSV stream
# Fetches file content from the FTP server as a stream of CSV entries(i.e., rows).
# + path - The path to the file on the FTP server
# + return - The file content as a stream of string arrays or, an `ftp:Error`
remote isolated function getCsvAsStream(string path) returns stream<readonly & string[], Error?>|Error;
Fetch JSON content as a Ballerina record
# Fetches JSON file content from the FTP server as a Ballerina record.
# + path - The path to the file on the FTP server
# + returnType - The expected Ballerina record type
# + return - The file content as the specified Ballerina record type or, an `ftp:Error`
remote isolated function getJsonWithType(string path, typedesc<T> returnType = <>) returns returnType|Error;
Fetch XML content as a Ballerina record
# Fetches XML file content from the FTP server as a Ballerina record.
# + path - The path to the file on the FTP server
# + returnType - The expected Ballerina record type
# + return - The file content as the specified Ballerina record type or, an `ftp:Error`
remote isolated function getXmlWithType(string path, typedesc<T> returnType = <>) returns returnType|Error;
Fetch CSV content as a Ballerina record
# Fetches CSV file content from the FTP server as a Ballerina record.
# + path - The path to the file on the FTP server
# + returnType - The expected Ballerina record type
# + return - The file content as the specified Ballerina record type or, an `ftp:Error`
remote isolated function getCsvWithType(string path, typedesc<T> returnType = <>) returns returnType[]|Error;
put()
methodUpload content as text
# Uploads file content to the FTP server directly as plain text.
# + path - The path to the file on the FTP server
# + content - The file content as a string
# + return - An `ftp:Error` if an error occurs, otherwise nil
remote isolated function putText(string path, string content) returns Error?;
Upload content as JSON or Ballerina record
# Uploads file content to the FTP server directly as JSON or, the counterpart Ballerina record representation.
# + path - The path to the file on the FTP server
# + content - The file content as JSON or a Ballerina record
# + return - An `ftp:Error` if an error occurs, otherwise nil
remote isolated function putJson(string path, json|record {|anydata...;|} content) returns Error?;
Upload content as XML or Ballerina record
# Uploads file content to the FTP server directly as XML or, the counterpart Ballerina record representation.
# + path - The path to the file on the FTP server
# + content - The file content as XML or a Ballerina record
# + return - An `ftp:Error` if an error occurs, otherwise nil
remote isolated function putXml(string path, xml|record {|anydata...;|} content) returns Error?;
Upload content as CSV
# Uploads file content to the FTP server as CSV or, the counterpart Ballerina record representation.
# + path - The path to the file on the FTP server
# + content - The file content as a 2D string array (CSV)
# + return - An `ftp:Error` if an error occurs, otherwise nil
remote isolated function putCsv(string path, string[][]|record {|anydata...;|} content) returns Error?;
Upload content as a byte array
# Uploads file content to the FTP server as a byte array.
# + path - The path to the file on the FTP server
# + content - The file content as a byte array
# + return - An `ftp:Error` if an error occurs, otherwise nil
remote isolated function putBytes(string path, byte[] content) returns Error?;
Upload content as a stream of byte arrays
# Uploads file content to the FTP server as a stream of byte arrays.
# + path - The path to the file on the FTP server
# + content - The file content as a stream of byte arrays
# + return - An `ftp:Error` if an error occurs, otherwise nil
remote isolated function putBytesFromStream(string path, stream<byte[], Error?> content) returns Error?;
Upload content as a CSV stream
# Uploads file content to the FTP server as a stream of string[].
# + path - The path to the file on the FTP server
# + content - The file content as a stream of string arrays
# + return - An `ftp:Error` if an error occurs, otherwise nil
remote isolated function putCsvFromStream(string path, stream<string[], Error?> content) returns Error?;
Fetching content
string stringContent = check ftpClient->getText("/path/to/file.txt");
json jsonContent = check ftpClient->getJson("/path/to/file.json");
xml xmlContent = check ftpClient->getXml("/path/to/file.xml");
byte[] byteContent = check ftpClient->getBytes("/path/to/file");
stream<readonly & byte[], ftp:Error?> byteStream = check ftpClient->getBytesAsStream("/path/to/file", 1024);
string[][] csvContent = check ftpClient->getCsv("/path/to/file.csv");
stream<string[], ftp:Error?> csvStream = check ftpClient->getCsvAsStream("/path/to/file.csv");
Person person = check ftpClient->getJsonWithType("/path/to/file.json");
Person person = check ftpClient->getXmlWithType("/path/to/file.xml");
Person[] person = check ftpClient->getCsvWithType("/path/to/file.csv");
Uploading content
check ftpClient->putText("/path/to/file", "{ \"name\": \"John\", \"age: 30 }");
Uploading content directly as JSON:
// 1. direct JSON content
check ftpClient->putJson("/path/to/file.json", { "name": "John", "age: 30 });
// 2. counterpart Ballerina record
Person person = { name: "John", age: 30 };
check ftpClient->putJson("/path/to/file.json", person);
Uploading content directly as XML:
// 1. direct XML content
check ftpClient->putXml("/path/to/file.xml", xml `<person><name>John</name><age>30</age></person>`);
// 2. counterpart Ballerina record
Person person = { name: "John", age: 30 };
check ftpClient->putXml("/path/to/file.json", person);
check ftpClient->putBytes("/path/to/file", [104, 101, 108, 108, 111]);
stream<Block, ftp:Error?> byteStream = ... // define the stream
check ftpClient->putBytesFromStream("/path/to/file", byteStream);
Uploading content directly as CSV:
// 1. direct CSV content
check ftpClient->putCsv("/path/to/file.csv", [["name", "age"], ["John", "30"]]);
// 2. counterpart Ballerina record
Person person = { name: "John", age: 30 };
check ftpClient->putCsv("/path/to/file.csv", person);
stream<string[], ftp:Error?> csvStream = ... // define the stream
check ftpClient->putCsvFromStream("/path/to/file.csv", csvStream);
get()
and put()
methods with the default behavior (stream of bytes) will continue to work without modifications.@daneshk @shafreenAnfar appreciate your reviews/feedback on the revised proposal added in https://github.com/ballerina-platform/ballerina-library/issues/6529?notification_referrer_id=NT_kwDOAbsAmLQxMDYzNTYyMjQxMzoyOTAzMjYwMA#issuecomment-2132589005.
There are a few minor queries. Overall LGTM
remote isolated function getJsonWithType(string path, typedesc<T> returnType = <>) returns returnType|Error;
returnType[]
array instead of single value.I wonder if we should we suffix file
to all APIs as follows.
remote isolated function fileGetCsv(string path) returns string[][]|Error;
Are we supporting inferring the return type in get*WithType() functions?
Yes we are. There was a mistakenly added syntax issue in the proposed *withType()
APIs and, I have updated them accordingly. Thanks for pointing that out.
getCSVWithType needs to return returnType[] array instead of single value.
Yes, updated the proposal with the suggestion.
Shall we support reading text files as string content?
If we require to support that scenario (in addition to retrieving the string content as a byte array), we have the option to add a new API called getString()
or, getText()
.
There I prefer to proceed with the name getText()
, as a API named getString()
might sound ambiguous (as JSON, CSV and XML datatypes can also be parsed into strings) compared to the other option.
@daneshk Please let me know your thoughts on this and, I'll update the proposal once we finalize the API name.
I wonder if we should we suffix
file
to all APIs as follows.remote isolated function fileGetCsv(string path) returns string[][]|Error;
@shafreenAnfar Yes we have that option too. My only concern is whether this naming convention might deviate from the existing naming conventions we have for our other FTP APIs (get
, append
, put
, mkdir
, rmdir
, rename
, size
, list
, isDirectory
, delete
). It seems like we have considered that having a file prefix/suffix is redundant since the FTP client itself is related to file transfer protocol (hence the DRY rule).
So if we are to do this, shouldn't we apply the same convention to the other APIs as well (e.g. rename
-> fileRename
, delete
->fileDelete
)?
WDYT?
Interesting! I was thinking two types of APIs; one set as low level APIs (what we have now) and another set as high level APIs.
Are we supporting inferring the return type in get*WithType() functions?
Yes we are. There was a mistakenly added syntax issue in the proposed
*withType()
APIs and, I have updated them accordingly. Thanks for pointing that out.getCSVWithType needs to return returnType[] array instead of single value.
Yes, updated the proposal with the suggestion.
Shall we support reading text files as string content?
If we require to support that scenario (in addition to retrieving the string content as a byte array), we have the option to add a new API called
getString()
or,getText()
. There I prefer to proceed with the namegetText()
, as a API namedgetString()
might sound ambiguous (as JSON, CSV and XML datatypes can also be parsed into strings) compared to the other option.@daneshk Please let me know your thoughts on this and, I'll update the proposal once we finalize the API name.
Yes, the getText()
is preferred as we are going to use this API to read .txt
files. I am +1
On second thoughts, I think we can skip the file and go ahead with what @NipunaRanasinghe suggested.
On second thoughts, I think we can skip the file and go ahead with what @NipunaRanasinghe suggested.
Ack @shafreenAnfar. Will proceed with the exiting design. Thanks for your valuable feedback so far on this.
The implementation is currently on hold due to #42830 and #42848 release blockers.
Summary
The current implementation of the
get()
method in the Ballerina FTP connector allows fetching file contents only as a stream of bytes. This proposal suggests enhancing theget()
method to support direct data binding to JSON, XML, and byte arrays, reducing the need for manual conversions and stream handling.Goals
get()
method to support direct data binding, allowing it to return file contents directly as JSON, XML, or byte arrays.Non-Goals
This proposal does not aim to remove the existing return type of the
get()
method, which returns file contents as a stream of bytes. All the newly introduced return types will be added as subtypes of a union type, to be backward compatible.This proposal does not aim to support additional data formats beyond JSON, XML, and byte arrays (e.g., CSV).
Motivation
Currently, fetching and converting file contents using the FTP connector involves multiple steps and explicit stream handling, which can be cumbersome and error-prone. By enabling direct data binding, we aim to provide a more streamlined and efficient way to handle file contents.
Description
The proposed changes will enhance the
get()
method in the FTP connector to automatically infer and bind the data type based on the expected return type. This will be accomplished by overloading theget()
method to accept a type descriptor as an optional parameter that dictates the expected format of the data.Proposed Changes to the
get()
MethodExamples
Fetching content directly as JSON:
Fetching content directly as XML:
Fetching content directly as byte array:
Fetching content as a stream of byte arrays:
Compatibility and Migration
get()
method with the default behavior (stream of bytes) will continue to work without modifications.Risks and Assumptions
returnType
. Incorrect assumptions may lead to runtime errors.