Closed RainierBarrett closed 3 years ago
Couple of things:
Per the discussion today lets try using an analytical approach.
Can you move all of the PDI code into its own funciton that we can call within the init
function? I think this will keep things nice and clean, and also, I imagine there are other places (e.g. planckton) we can use this PDI sampling code, so keeping it portable would be useful.
So far it looks like its working well. I did get some errors when passing a value for pdi
and M_n
while leaving M_w = None
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-16-4563c3631d4d> in <module>
1 test_system = System(molecule="PEEK", para_weight=0.60,
2 density=1, n_compounds=20, sample_pdi=True, M_n=10., pdi=2,
----> 3 forcefield=None)
<ipython-input-15-e52777e0d53c> in __init__(self, molecule, para_weight, density, n_compounds, polymer_lengths, forcefield, remove_hydrogens, epsilon, sample_pdi, pdi, M_n, M_w)
215 self.pdi = M_w / M_n
216 mass_mean = M_n
--> 217 mass_var = M_n * (M_w - M_n) # from solving integrated gaussians
218 # now sample from our distribution
219 print(f'sampling {n_compounds} molecular weights...')
TypeError: unsupported operand type(s) for -: 'NoneType' and 'float'
I think it was due to mass_var
calling for M_w
when it was equal to None
. When M_w
was solved for, the value was assigned to self.M_w
, so M_w
as passed was still equal to None
I was able to fix it by making sure self.M_w
, self.M_n
and self.pdi
are always defined and updating everything afterwards to use the self.
attributes.
I think it will be helpful to make these values as System class attributes. When using signac in the uli-flow repo, we can dump a bunch of the system information for each job into that job's .json file. We'll include the values of M_n, M_w and pdi in that info dump.
Here is what I changed:
if sample_pdi:
if isinstance(n_compounds, list):
raise TypeError('n_compounds should be an integer when sample_pdi is True.')
pdi_arg_sum = sum([x is not None for x in [pdi, M_n, M_w]])
assert pdi_arg_sum >= 2, 'At least two of [pdi, M_n, M_w] must be given.'
if pdi_arg_sum == 3:
#special case, make sure that pdi = M_w / M_n
assert pdi - (M_w/M_n) < self.epsilon, 'PDI value does not match M_n and M_w values.'
else:
# need to recover one of M_w or M_n or pdi
if M_n is None:
self.M_n = M_w / pdi
self.M_w = M_w
self.pdi = pdi
if M_w is None:
self.M_w = pdi * M_n
self.M_n = M_n
self.pdi = pdi
if pdi is None:
self.pdi = M_w / M_n
self.M_w = M_w
self.M_n = M_n
mass_mean = self.M_n
mass_var = self.M_n * (self.M_w - self.M_n) # from solving integrated gaussians
# now sample from our distribution
print(f'sampling {n_compounds} molecular weights...')
# TODO: make sure we don't sample any negative weights
samples = np.round(np.random.normal(loc=mass_mean,
scale=np.sqrt(mass_var),
size=n_compounds)
).astype(int)
# get all unique lengths in increasing order
self.polymer_lengths = sorted(list(set(samples)))
# get count of each length
self.n_compounds = [list(samples).count(x) for x in self.polymer_lengths]
print(f'polymer_lengths: {self.polymer_lengths}, n_compounds: {self.n_compounds}')
That seems to work with specifying any combination of M_w, M_n, and pdi.
Other than that, everything looks good except for getting negative polymer lengths. We'll wait to merge in these changes since negative polymer lengths does break the system initialization.
Couple of things:
self.M_n
, but its never actually being set as a class attribute when its passed in by the user. I offered a fix/suggestion in that comment.<ipython-input-12-774b9e97a45b> in _recover_mass_dist(self, distribution)
272 elif distribution.lower() == 'weibull':
273 # get the shape parameter
--> 274 a = scipy.optimize.root(self._weibull_k_expression, args=(self.Mn, self.Mw), x0=1.)
275 recovered_k = a['x']
276 # get the scale parameter
AttributeError: 'System' object has no attribute 'Mn'
sample_pdi = True
and n_compounds
is a list, lets just pull out the value from the list. The important thing is that we only have a single number for n_compounds when sample_pdi = True.Before:
if isinstance(n_compounds, list):
raise TypeError('n_compounds should be an integer when sample_pdi is True.')
Instead, lets do something like this:
if isinstance(n_compounds int):
self.n_compounds = n_compounds
elif isinstance(n_compounds, list) and len(n_compounds) == 1:
self.n_compounds = n_compounds[0]
elif isinstance(n_compounds, list) and len(n_compounds) != 1:
raise TypeError('n_compounds should be of length 1 when sample_pdi is True.')
simulate.py
which is what will ultimately be used in this package.Sorry, guess I mistakenly grabbed the .svg files. I guess in practice we don't need to have the .csv files committed and available, but then it might make more sense to just move those cells to a scratchwork book or something, so anyone can run the main notebook without issue. What do you think? In the meantime I'll get these other points sorted.
Sorry, guess I mistakenly grabbed the .svg files. I guess in practice we don't need to have the .csv files committed and available, but then it might make more sense to just move those cells to a scratchwork book or something, so anyone can run the main notebook without issue. What do you think? In the meantime I'll get these other points sorted.
Yeah, I think moving a lot of the pdi sampling work that generating the plots and csv files to a separate notebook would be a good idea. Having all of that will be helpful if we decide to go into detail about it when writing a paper or if we have to present about it. Also, I wonder if this PDI sampling code could eventually be implemented into mBuild...maybe as a plugin.
Okay, those commits should address your points of concern. However, before I merge this in, now that I'm running this I'm getting an error from Foyer saying there's multiple atom types being given. Sampling of the molar masses seems to have worked, but the next step is dying. Can you replicate this or is something in my install going haywire?
polymer_lengths: [19, 30, 31, 45, 62, 63, 64, 65, 68, 82], n_compounds: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
---------------------------------------------------------------------------
FoyerError Traceback (most recent call last)
<ipython-input-65-7ad801da21e3> in <module>
1 test_system = System(molecule="PEEK", para_weight=0.60,
2 density=1, n_compounds=10, sample_pdi=True, M_n=50., M_w=55.,
----> 3 forcefield='gaff')
<ipython-input-64-5b9435b53feb> in __init__(self, molecule, para_weight, density, n_compounds, polymer_lengths, forcefield, remove_hydrogens, epsilon, sample_pdi, pdi, M_n, M_w)
237 self.system_mb = self._pack() # mBuild object before applying FF
238 if self.forcefield:
--> 239 self.system_pmd = self._type_system() # parmed object after applying FF
240
241
<ipython-input-64-5b9435b53feb> in _type_system(self)
273 forcefield = foyer.Forcefield(name='oplsaa')
274
--> 275 typed_system = forcefield.apply(self.system_mb)
276 if self.remove_hydrogens: # not sure how to do this with Parmed yet
277 removed_hydrogen_count = 0 # subtract from self.mass
~/foyer/foyer/forcefield.py in apply(self, structure, references_file, use_residue_map, assert_bond_params, assert_angle_params, assert_dihedral_params, assert_improper_params, combining_rule, verbose, *args, **kwargs)
571 structure = structure.to_parmed(**kwargs)
572
--> 573 typemap = self.run_atomtyping(structure, use_residue_map=use_residue_map, **kwargs)
574
575 self._apply_typemap(structure, typemap)
~/foyer/foyer/forcefield.py in run_atomtyping(self, structure, use_residue_map, **kwargs)
609 if structure.residues[res_id].name not in residue_map.keys():
610 tmp_res = _structure_from_residue(res, structure)
--> 611 typemap = find_atomtypes(tmp_res, forcefield=self)
612 residue_map[res.name] = typemap
613
~/foyer/foyer/atomtyper.py in find_atomtypes(structure, forcefield, max_iter)
63
64 _iterate_rules(rules, structure, typemap, max_iter=max_iter)
---> 65 _resolve_atomtypes(structure, typemap)
66
67 return typemap
~/foyer/foyer/atomtyper.py in _resolve_atomtypes(structure, typemap)
131 elif len(atomtype) > 1:
132 raise FoyerError("Found multiple types for atom {} ({}): {}.".format(
--> 133 atom_id, atoms[atom_id].atomic_number, atomtype))
134 else:
135 raise FoyerError("Found no types for atom {} ({}).".format(
FoyerError: Found multiple types for atom 3631 (6): ['cv', 'ce'].
There are still a few bugs. I think I mentioned them in an earlier comment, but we need to make sure the class attributes for Mn, Mw, and pdi are always being set. This is because the code as written calls the class attributes in the _recover_mass_dist
function.
For example:
a = scipy.optimize.root(self._weibull_k_expression, args=(self.Mn, self.Mw), x0=1.)
This calls for self.Mn
; however, self.Mn
is only being defined when its being solved for:
if pdi_arg_sum == 3:
#special case, make sure that pdi = Mw / Mn
assert pdi - (Mw/Mn) < self.epsilon, 'PDI value does not match Mn and Mw values.'
else:
# need to recover one of Mw or Mn or pdi
if Mn is None:
self.Mn = Mw / pdi
if Mw is None:
self.Mw = pdi * Mn
if pdi is None:
self.pdi = Mw / Mn # from here
So, if I pass in values for Mn
and pdi
then the script solves for and defines self.Mw
, but never defines the variables self.Mn
and self.pdi
Similarly, if I were to pass in values for all 3 of Mw
, Mn
and pdi
, then none of the self.Mn
self.Mw
and self.pdi
are defined.
Here is what needs to be added (you'll notice my changes under each of the If statements):
if pdi_arg_sum == 3:
#special case, make sure that pdi = Mw / Mn
assert pdi - (Mw/Mn) < self.epsilon, 'PDI value does not match Mn and Mw values.'
self.pdi = pdi
self.Mn = Mn
self.Mw = Mw
else:
# need to recover one of Mw or Mn or pdi
if Mn is None:
self.Mn = Mw / pdi
self.Mw = Mw
self.pdi = pdi
if Mw is None:
self.Mw = pdi * Mn
self.Mn = Mn
self.pdi = pdi
if pdi is None:
self.pdi = Mw / Mn # from here
self.Mn = Mn
self.Mw = Mw
Also, once this is working we'll have to add these changes to simulate.py
since that is what is actually being used. The notebook is kind of a scratch pad and a temporary way of testing any changes until we get proper unit tests, but simulate.py
is what is being used when submitting jobs on fry.
I'm going to close this, see PR #25.
There were just a few conflicts because of recent PRs made..for the sake of speeding this up I created my own branch, copied over the simulate.py
file from this PR and fixed the conflicts. Sorry for being a bit janky but just wanted to move things along...also, we need some unit tests as soon as possible!
This PR adds support for sampling from a molecular weight distribution given any two of [PDI, M_n, M_w], with some input validation. We analytically solved for the value of sigma in a gaussian distribution centered at M_n by integrating the continuous form of M_w. Right now, this does not exactly match the numerical examples used. Not sure why that is, but it is very close.