Write Parquet datasets to remote storage

Is your feature request related to a problem? Please describe.

When running the parquet sink in the cloud, it's annoying to have to use persistent disks and manually upload the parquet files to S3. We should have a way to automatically upload parquet files as they're produced.

Describe the solution you'd like

If the user specifies an --output-dir that starts with s3://, write to that S3 bucket + subdirectory. If the output dir doesn't have any prefix or the prefix is file://, write to file (current behaviour).

Additional context

It's probably enough to change write_batch to serilazie data to BytesMut, then write the bytes to a file or upload it to S3.

We need a trait like the following:

pub trait DatasetWriter {
  async fn write_parquet(&mut self, filepath: impl Into<String>, data: &[u8]) -> Result<()>;
}

where filepath is the full path relative to the writer root (the path or bucket specified by the user) and data is the serialized content (the output of writer in the current implementation).

apibara / dna

Write Parquet datasets to remote storage #315