ansys / pydyna

Python interface to the LS-DYNA solver
https://dyna.docs.pyansys.com
MIT License
41 stars 9 forks source link

Improve performance for reading INITIAL_STRESS_SHELL and INITIAL_STRAIN_SHELL #592

Open koubaa opened 12 months ago

koubaa commented 12 months ago

Here are two files (I use the .txt extension because .key is not supported by github). slow2 just contains a subset of the content from slow6.

I profiled slow2: deck.import_file(path_to_file)

and got these results: 12516277/12496122 1.297 0.000 2.150 0.000 {built-in method builtins.isinstance} 241860 0.893 0.000 2.916 0.000 C:\AnsysDev\Apps\Python311\Lib\site-packages\pandas\core\construction.py:518(sanitize_array) 161240 0.779 0.000 0.875 0.000 C:\AnsysDev\Apps\Python311\Lib\warnings.py:181(_add_filter) 20155 0.564 0.000 4.878 0.000 C:\AnsysDev\Apps\Python311\Lib\site-packages\pandas\io\parsers\base_parser.py:513(_convert_to_ndarrays) 181395 0.516 0.000 2.086 0.000 C:\AnsysDev\Apps\Python311\Lib\site-packages\pandas\core\dtypes\common.py:1587(pandas_dtype) 141085 0.486 0.000 0.486 0.000 {pandas._libs.lib.maybe_convert_numeric} 60465 0.438 0.000 2.775 0.000 C:\AnsysDev\Apps\Python311\Lib\site-packages\pandas\core\indexes\base.py:477(new)

The slowness is because the Fixed Width File (fwf) parser in pandas is implemented in python, while the (much) faster csv parser have implementations in C and arrow. I poked around the implementation but it is a bit over my head.

It's worth mentioning that the fwf parser is waaaay faster if its one big dataframe. This INITIAL_STRAIN_SHELL keyword has thousands of small tables, and there appears to be some fixed overhead in pandas for parsing each one.

slow2.txt slow6.txt