ansys / pydyna

Python interface to the LS-DYNA solver
https://dyna.docs.pyansys.com
MIT License
41 stars 9 forks source link

Improve performance for reading INITIAL_STRESS_SHELL and INITIAL_STRAIN_SHELL #592

Open koubaa opened 1 year ago

koubaa commented 12 months ago

Here are two files (I use the .txt extension because .key is not supported by github). slow2 just contains a subset of the content from slow6.

I profiled slow2: deck.import_file(path_to_file)

and got these results: 12516277/12496122 1.297 0.000 2.150 0.000 {built-in method builtins.isinstance} 241860 0.893 0.000 2.916 0.000 C:\AnsysDev\Apps\Python311\Lib\site-packages\pandas\core\construction.py:518(sanitize_array) 161240 0.779 0.000 0.875 0.000 C:\AnsysDev\Apps\Python311\Lib\warnings.py:181(_add_filter) 20155 0.564 0.000 4.878 0.000 C:\AnsysDev\Apps\Python311\Lib\site-packages\pandas\io\parsers\base_parser.py:513(_convert_to_ndarrays) 181395 0.516 0.000 2.086 0.000 C:\AnsysDev\Apps\Python311\Lib\site-packages\pandas\core\dtypes\common.py:1587(pandas_dtype) 141085 0.486 0.000 0.486 0.000 {pandas._libs.lib.maybe_convert_numeric} 60465 0.438 0.000 2.775 0.000 C:\AnsysDev\Apps\Python311\Lib\site-packages\pandas\core\indexes\base.py:477(new)

The slowness is because the Fixed Width File (fwf) parser in pandas is implemented in python, while the (much) faster csv parser have implementations in C and arrow. I poked around the implementation but it is a bit over my head.

It's worth mentioning that the fwf parser is waaaay faster if its one big dataframe. This INITIAL_STRAIN_SHELL keyword has thousands of small tables, and there appears to be some fixed overhead in pandas for parsing each one.

slow2.txt slow6.txt