cheminfo / sdf-parser

Parse a SDF file and convert it to an array of objects
http://cheminfo.github.io/sdf-parser/
MIT License
11 stars 7 forks source link

Parsing speed optimization for large files #3

Closed ivan-zatravkin-quantori closed 3 years ago

ivan-zatravkin-quantori commented 3 years ago

This PR replaces regex with solution based around indexOf. On our datasets it cuts down time to parse a file from 13 seconds to 1.

PR comes with tests to verify it matches old behavior.

ivan-zatravkin-quantori commented 3 years ago

Thanks for the review, I have updated my code accordingly.

Also, did a quick test today with firefox - in firefox regex works really fast, on our 50 mb files it takes around 20 ms. But chrome struggles with it and takes ~10 seconds. This getEntriesBoundaries works in ~20-30 ms in both chrome and firefox.

No idea why chrome is so slow with it and was unable to find any open issues in chromium project, maybe I will file an issue there separately.