Focus on molecular systems

mjclarke94 commented 3 years ago

Atomic and charged systems do not have the molecule-ID property, but the majority of the tools this code rely on this being present. A work around to allow atomic and charge type data to be parsed would be useful.

Possibly a flag which just declares all atoms to be part of one large molecule if the property is not present?

mike5603 commented 3 years ago

Can you give an example of a property you would like to calculate so I can look into it?

mjclarke94 commented 3 years ago

Ideally conductivities (NEK and GBK), with ion pair lifetime and rdf both also being useful.

From a look over the code it seems that it is the COM calculations where the property matters and in most other cases it is then just fed forward to other calcs. I bodged it by using the atom-id as the mol flag, but I'm sure this causes far more problems than it solves!

mike5603 commented 3 years ago

One of the major reasons that this code is organized that way is to save on memory. Python can use a lot of memory so saving the velocities of all atoms over all timesteps would probably be too much memory for most computers/servers. By calculating the center of mass and saving that instead, it saves a ton of memory. Also, since a lot of the calculations are O(n^2), it can save a lot on run time as well

Honestly, I see no problem in using atom-id as the mol flag for atomistic and charged systems, as long as the system has a reasonable number of atoms. How big are your systems?

mjclarke94 commented 3 years ago

That's interesting, I was wondering why it was being calculated. Thanks for the explanation!

Sure, I was just wondering whether that approach could be implemented as a --id-as-mol flag rather than needing to modify trajectory files? I've done something similar but I'm sure it would break things elsewhere! Looking at the COM code, I think it would currently result in the centre of mass of each individual atom being calculated (which would just be...its position), and then afterwards the centre of all of those would be calculated. Obviously that is something which for large systems would be a lot of wasted additional calcs.

I have around 10-20k atoms in a typical system with around 1000 frames. (dt = 1fs, dump every 1000, trange = 10 ns)

mike5603 commented 3 years ago

Here is a quick workaround for this I think will work, but I will leave this issue open for if someone wants to create a better solution.

Within src, there are four files calcCond.py, calcCOM.py, calcDielectricConstant.py and distSearch.py. Within each of these files, there is a getcolumns function with the following line

molcol = inline.index('mol')

replace the 'mol' with something like 'id' or whichever column you would like to use as the molecule descriptor.

This is obviously a workaround and you are correct that you will still be doing unnecessary calculations, but I think most of the time in calcCOM is spent on reading the files and it is an O(n) operation, so I believe that these calculations shouldn't have much affect on the runtime of the calculation.

mjclarke94 commented 3 years ago

Sounds good. Thanks for the help!

MaginnGroup / PyLAT

Focus on molecular systems #10