csv input case 'name_col" error

Daehun-Bae commented 3 months ago

https://github.com/Shualdon/QupKake/blob/f7a294e91929e4e4f0a7100ad5cf5a64a34a7d4f/qupkake/cli.py#L103C1-L104C1

When I use qupkake without "name_col", then all smiles have same name like this "molecule_RangeIndex(start=1, stop=281, step=1)" I think it should be fixed.

Also, I tried qupkake for novartis benchmark, It takes too long times. When, I run "qupkake file data/novartis_qupkake_pka.sdf -t -m 8" codes, It takes an 1 hour.... My server has same spec in your paper "3.85 GHz AMD EPYC 9374F CPU with 32 cores"

Thank you.

ghutchis commented 3 months ago

I think it should be fixed.

Please feel free to submit a pull request.

"qupkake file data/novartis_qupkake_pka.sdf -t -m 8" codes, It takes an 1 hour....

I guess it depends on your view of "too long". We're pretty clear in the paper that the time-limiting step is mostly running xtb / crest calculations.

Daehun-Bae commented 3 months ago

Thank you for your reply and interesting research.

I'll submit a pull request, as soon as possible.

I think that it is still fast. But, in Figure 4, average compute time per molecule across the 280 molecules in the Novartis test set with 8 CPU Cores, It takes 0.67s per molecules. But, in my implementation 12.8s per molecules.

Shualdon / QupKake

csv input case 'name_col" error #8