I have been using pyarrow's pa.hdfs.connect() and pq.ParquetDataset to read files before using daft to read from pyarrow. The alternative is to simply use pandas' read_parquet and then daft's from_pandas. However, this is extremely slow and often leads to memory related errors. If Daft could work on this file system then it would be easier for me to reuse the same code across sources and also fall back less on spark.
I have been using pyarrow's pa.hdfs.connect() and pq.ParquetDataset to read files before using daft to read from pyarrow. The alternative is to simply use pandas' read_parquet and then daft's from_pandas. However, this is extremely slow and often leads to memory related errors. If Daft could work on this file system then it would be easier for me to reuse the same code across sources and also fall back less on spark.