bjodah / chempy

⚗ A package useful for chemistry written in Python
BSD 2-Clause "Simplified" License
544 stars 78 forks source link

Parse edits #179

Open spizwhiz opened 4 years ago

spizwhiz commented 4 years ago

Implemented changes to parsing.py as discussed in #176. Changed notation for hydrates to ":" instead of "."

changed html, latex, and unicode parsing to match changes

spizwhiz commented 4 years ago

Well, I think I have the parsing.py and associated test issues sorted. I got stuck on the balancing and reaction modules, and think it would be best if someone more familiar with those could help out. For now, I have the code doing what I need for my purposes but am happy to help finish this up where I can.

bjodah commented 4 years ago

Thank you for working on this. This looks good, but unfortunately with the exception for radicals. I had forgot I used "." to indicate radicals. Here a colon is confusing, and would probably be interpreted as a diradical. The only way I see around this issue is to parse "." in a context sensitive manner, i.e. when between two numbers: it's a decimal point, and when leading (and in front of a letter?) it's a radical. Perhaps someone interested in seeing this getting into master will take over the work here.

jeremyagray commented 4 years ago

I intend to work on this more, handling it generally like you suggest. I have the grammar worked out, at least on paper. I’m glad you mentioned the bit about radicals as I had not considered it yet and will include it.

I had stopped at this point to think about the data structure of the parsed output. Right now, chempy.util.parsing._get_formula_parser().parseString() returns a list of element and count pairs. But in _aqueous.py, ions_from_formula() indicates that parsing ions out of the formula is a goal and as currently implemented, that would require another parser to produce ions instead of elements. My thinking was to tie all this together in parsing.py with the parser and helper functions, and use a dict or class to hold the original string, the composition, the state, the charge, etc. and to decide if the compound is ionic or not and store the lists of ions, complexes, etc. This would make some things easy, like naming ionic compounds or removing spectator ions from a reaction and hopefully won't make anything difficult. Thoughts?

On Sat, Aug 8, 2020 at 12:22 Bjorn notifications@github.com wrote:

Thank you for working on this. This looks good, but unfortunately with the exception for radicals. I had forgot I used "." to indicate radicals. Here a colon is confusing, and would probably be interpreted as a diradical. The only way I see around this issue is to parse "." in a context sensitive manner, i.e. when between two numbers: it's a decimal point, and when leading (and in front of a letter?) it's a radical. Perhaps someone interested in seeing this getting into master will take over the work here.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/bjodah/chempy/pull/179#issuecomment-670953083, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOQCHS2PZ2J2GC6JDBNHLU3R7WCWHANCNFSM4PC5T2JQ .