ddotta / parquetize

R package that allows to convert databases of different formats to parquet format
https://ddotta.github.io/parquetize/
62 stars 4 forks source link

Add the feature to convert txt files #11

Closed ddotta closed 1 year ago

etiennebacher commented 1 year ago

Hi @ddotta this feature would be useful, was it implemented? I can't find a function txt_to_parquet() or similar

ddotta commented 1 year ago

Hi @etiennebacher
At the moment there is only the csv_to_parquet() function. Could you provide an example of a reproducible text file that would be interesting for parquetize to support for conversion?

etiennebacher commented 1 year ago

Any data in a .txt format?

write.table(iris, "test.txt")

To give more context I just obtained some large census data that is stored as .txt so having something that automatically converts these text files to parquet (if possible by batches) would be useful

ddotta commented 1 year ago

@etiennebacher

You can use csv_to_parquet() like this for your need:

write.table(iris, "test.txt")

 csv_to_parquet(
   path_to_file = "test.txt",
      path_to_parquet = tempfile(fileext=".parquet")
)

Or with an URL:

 csv_to_parquet(
   path_to_file = "https://www.bionumerics.com/sites/default/files/download/Antibiotics%20sample%20data.zip",
   filename_in_zip = "MIC data.txt",
   path_to_parquet = tempfile(fileext=".parquet")
 )

I will update the documentation for csv_to_parquet()

etiennebacher commented 1 year ago

Thanks, I was deceived by the function name but it work indeed 😄