The slowness is because the Fixed Width File (fwf) parser in pandas is implemented in python, while the (much) faster csv parser have implementations in C and arrow. I poked around the implementation but it is a bit over my head.
It's worth mentioning that the fwf parser is waaaay faster if its one big dataframe. This INITIAL_STRAIN_SHELL keyword has thousands of small tables, and there appears to be some fixed overhead in pandas for parsing each one.
Here are two files (I use the .txt extension because .key is not supported by github). slow2 just contains a subset of the content from slow6.
I profiled slow2: deck.import_file(path_to_file)
and got these results: 12516277/12496122 1.297 0.000 2.150 0.000 {built-in method builtins.isinstance} 241860 0.893 0.000 2.916 0.000 C:\AnsysDev\Apps\Python311\Lib\site-packages\pandas\core\construction.py:518(sanitize_array) 161240 0.779 0.000 0.875 0.000 C:\AnsysDev\Apps\Python311\Lib\warnings.py:181(_add_filter) 20155 0.564 0.000 4.878 0.000 C:\AnsysDev\Apps\Python311\Lib\site-packages\pandas\io\parsers\base_parser.py:513(_convert_to_ndarrays) 181395 0.516 0.000 2.086 0.000 C:\AnsysDev\Apps\Python311\Lib\site-packages\pandas\core\dtypes\common.py:1587(pandas_dtype) 141085 0.486 0.000 0.486 0.000 {pandas._libs.lib.maybe_convert_numeric} 60465 0.438 0.000 2.775 0.000 C:\AnsysDev\Apps\Python311\Lib\site-packages\pandas\core\indexes\base.py:477(new)
The slowness is because the Fixed Width File (fwf) parser in pandas is implemented in python, while the (much) faster csv parser have implementations in C and arrow. I poked around the implementation but it is a bit over my head.
It's worth mentioning that the fwf parser is waaaay faster if its one big dataframe. This INITIAL_STRAIN_SHELL keyword has thousands of small tables, and there appears to be some fixed overhead in pandas for parsing each one.
slow2.txt slow6.txt