IUPAC-InChI / InChI

Main InChI repository
https://iupac-inchi.github.io/InChI-Web-Demo/
MIT License
69 stars 9 forks source link

Faster Element Symbol Handling #31

Closed johnmay closed 4 months ago

johnmay commented 4 months ago

Testing on ChEMBL I see 2.4 million identical InChI's and runs ~9 seconds faster (M1 Pro). Overall I mainly was wincing and some of the lookups inside nested ternaries (e.g. ichitaut.c)

main:
INCHI_EXE/bin/Linux/inchi-1 /data/chembl_34.sdf -AuxNone -NoWarnings -NoLabel
264.73s user 24.96s system 99% cpu 4:51.59 total
PR:
INCHI_EXE/bin/Linux/inchi-1 /data/chembl_34.sdf -AuxNone -NoWarnings -NoLabel
257.91s user 23.88s system 99% cpu 4:42.53 total

I may have left some unused variables when I removed the lazy if (!en){ } initialisations but hopefully got them all.

johnmay commented 4 months ago

I've made some more changes to where the element arrays get setup in various places (mainly strutil.c ion_pair handling). I also found some bugs from when the thread safety was added, nothing major but some had an out of bounds checks which wouldn't have been obvious to the static analysis.

I'll add these separately as this patch is nice and easy to understand.

djb-rwth commented 4 months ago

Hi @johnmay, Thank you for these useful suggestions. These changes make sense as using constants instead of calling a function can indeed make a notable increase in performance, especially whilst processing large datasets.

The only suggestion from my side is to add these changes to the rwth branch instead of main directly as I need to add a file which should fix a rather serious bug; afterwards everything shall be merged into the main branch.