A cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.
When handling valid AVRO files which resides deep inside a nested filesystem tree, Windows OS returns 8.3 DOS short names for a file that is chosen through the file picker.
So a file becomes
C:\Work\Tools\cdgc\SADA_O~1\SADA_O~1\INGEST~1\8B6EC9~1\content\data\RELATI~1\COMINF~1.DAT\BACF6F~1.AVR
but its actual filename is something like
c:\Work\Tools\cdgc\<long path here>\bacf6f.something.avro
This causes the default data parser to be chosen (parquet) which then leads to
java.lang.RuntimeException: file:/C:/Work/Tools/cdgc/SADA_O~1/SADA_O~1/INGEST~1/8B6EC9~1/content/data/RELATI~1/COMINF~1.DAT/BACF6F~1.AVR is not a Parquet file. Expected magic number at tail, but found [35, 46, 107, 40]
Ive managed to patch this with a powershell hack as follows:
When handling valid AVRO files which resides deep inside a nested filesystem tree, Windows OS returns 8.3 DOS short names for a file that is chosen through the file picker. So a file becomes
C:\Work\Tools\cdgc\SADA_O~1\SADA_O~1\INGEST~1\8B6EC9~1\content\data\RELATI~1\COMINF~1.DAT\BACF6F~1.AVR
but its actual filename is something like
c:\Work\Tools\cdgc\<long path here>\bacf6f.something.avro
This causes the default data parser to be chosen (parquet) which then leads to
java.lang.RuntimeException: file:/C:/Work/Tools/cdgc/SADA_O~1/SADA_O~1/INGEST~1/8B6EC9~1/content/data/RELATI~1/COMINF~1.DAT/BACF6F~1.AVR is not a Parquet file. Expected magic number at tail, but found [35, 46, 107, 40]
Ive managed to patch this with a powershell hack as follows:
`
`