Interpretation of underlying logic of pyarrow manipulate hive

i used pyarrow to handle hdfs files in hive. And i review the source code of pyarrow. The mainly utilities about hdfs filesystem are function's about parquet, many about io and meta or schema inferred which is rich to use it. Another aspect is plain read function , read as text to manipulate text file in hdfs file system. as i know if i create table in hive by default the save format is text. and when i use HdfsFileSystem to deep into the truly path in hdfs of hive. It seems like the schema and meta info (and the auto parsing of delimited lines)of table can't retrieved by internal api. There i don't want to use sql tools as pyhive or others to make it as a "two source"(one from abstract sql another from plain file system) problem. even it is simple. So at present, i must use pd.read_csv with the f returned by fs.open and retrieve schema info from mysql's TBLS where the detail schema info truly located of hive metastore. I think this design is not perfect. So i want to know is that, did i omit some details about the underlying logic about pyarrow related with hdfs file system in hive ? please make a interpretation for me. All about this is pyarrow internal construction instead of other framework. And i also want to have a brief introduction about dataset api 's function about hive's parquet file and text file. Can you give me some examples about them, mainly about text save format in hive's hdfs. I also take a glare a datas source transport toolkit called sqoop, in its AppendUtils.java file it use some detail partition manipulates toolkit to perform data append and i think all functions can be rebuilder with pyarrow. But as i review the source code about pyarrow , i can not find some developed logic about "partition" and 'warehouse' manipulation. Did some one build some projects use pyarrow or arrow's other api which have implement these function ?

apache / arrow

Interpretation of underlying logic of pyarrow manipulate hive #9026