CalebBell / chemicals

chemicals: Chemical database of Chemical Engineering Design Library (ChEDL)
MIT License
186 stars 36 forks source link

identifiers.py #18

Closed yoelcortes closed 4 years ago

yoelcortes commented 4 years ago

Thanks for working on this and asking for my comments. Here are my comments at first glace:

Here is one additional small change:

# Original
def search_pubchem(self, pubchem, autoload=True):
    if type(pubchem) != int:
        pubchem = int(pubchem)
    return self._search_autoload(pubchem, self.pubchem_index, autoload=autoload)

# This is slightly faster and concise
def search_pubchem(self, pubchem, autoload=True):
    return self._search_autoload(int(pubchem), self.pubchem_index, autoload=autoload)

I hope these comments are helpful and I didn't misinterpret any code, Thanks!

CalebBell commented 4 years ago

Hi Yoel,

Thank you for the comprehensive feedback. I have incorporated it into the file, hopefully making it quite a bit better. The changes are:

1) The design of the database is now - load all the small databases; load the elements. If a search fails, load the large database; load the small databases; load the elements. Search again. This makes sense to me because the small databases are, well, small. 2) We seem to have different experiences with name conflicts. For me, my intention has always been for the small databases to supersede the values in the big database I can't really edit. If you have a conflict, please let me know. 3) I renamed autoload_next to finished_loading. 4) The element search is necessary to handle searches like "nitrogen" and 1 correctly. I cleaned it up and added some comments. 5) I implemented a fast cache size limit and did the wrapper function like you suggested - a clear improvement.

Thank you for the feedback and I hope you can find the issue you had with ions so we can fix it.

Sincerely, Caleb