TUCAN-nest / TUCAN

A molecular identifier and descriptor for all domains of chemistry.
https://tucan-nest.github.io
GNU General Public License v3.0
22 stars 5 forks source link

Evaluate if parts of IUPAC InChI canonicalization can be re-used #11

Closed schatzsc closed 3 years ago

schatzsc commented 3 years ago

Not sure where to leave this so opened another issue:

Here is the code from InChI from the GitHub repository of Richard Apodaca (Metamolecular Inc. La Jolla, https://metamolecular.com/

https://github.com/metamolecular/inchi

Source code outside GitHub here:

https://www.inchi-trust.org/downloads

Not sure where to even start, but this file here shows the whole nonsense with the standard valences:

https://github.com/metamolecular/inchi/blob/master/INCHI_BASE/src/util.c

Just look at this file - it is one of like 50+ and by itself already 4-5x as long as your whole current code.

schatzsc commented 3 years ago

And see what is just need to call what I asusme is the serialization routine in ichimain.h:

int SortAndPrintINChI( struct tagCANON_GLOBALS *pCG, INCHI_IOSTREAM *out_file, INCHI_IOS_STRING *strbuf, INCHI_IOSTREAM *log_file, INPUT_PARMS *ip, ORIG_ATOM_DATA *orig_inp_data, ORIG_ATOM_DATA *prep_inp_data, COMP_ATOM_DATA composite_norm_data[INCHI_NUM][TAUT_NUM + 1], ORIG_STRUCT *pOrigStruct, int num_components[INCHI_NUM], int num_non_taut[INCHI_NUM], int num_taut[INCHI_NUM], INCHI_MODE bTautFlags[INCHI_NUM], INCHI_MODE bTautFlagsDone[INCHI_NUM], NORM_CANON_FLAGS *pncFlags, long num_inp, PINChI2 *pINChI[INCHI_NUM], PINChI_Aux2 *pINChI_Aux[INCHI_NUM], int *pSortPrintINChIFlags, unsigned char save_opt_bits );

schatzsc commented 3 years ago

This here I guess from the filename is the canonicalization routine file:

https://github.com/metamolecular/inchi/blob/master/INCHI_BASE/src/ichicano.c

2729 lines of code (ok, to be fair, including comments and empty lines)

Plus another 6000+ lines in

https://github.com/metamolecular/inchi/blob/master/INCHI_BASE/src/ichican2.c

schatzsc commented 3 years ago

Suggest to close this issue, since we are much too far from the InChI code and methodology - do you agree?