MDAnalysis / mdanalysis

MDAnalysis is a Python library to analyze molecular dynamics simulations.
https://mdanalysis.org
Other
1.28k stars 644 forks source link

writing massive VMD selection macros #1849

Open tylerjereddy opened 6 years ago

tylerjereddy commented 6 years ago

The VMD selection macro writing docs are pretty straightforward, but I have noticed that when I use massive selections, they do not appear to be handled correctly.

If I dump output to the console I can actually see the last index value registered by VMD in the Representations window when loading in the huge string / macro, and it is not even close to the last index value in the selection macro (when working with the GUI singlewords selection as docs suggest).

I tried to parse the point at which loading in these massive selection strings fails with this:

input_files = ['outer_leaflet_frame_1400_sel_replicate_1.vmd', 'inner_leaflet_frame_1400_sel_replicate_1.vmd']
max_obs_vals = ['16533', '944625']

for input_file, max_val in zip(input_files, max_obs_vals):
    total_chars = 0 
    with open(input_file) as infile:
        lines = infile.readlines()
        for line in lines:
            if not max_val in line:
                total_chars += len(line)
            else:
                new_line = ''.join(line.split(max_val)[0])
                total_chars += len(new_line)
                break
        print("total_chars:", total_chars)

And I get:

total_chars: 8415
total_chars: 8373

Assuming my script isn't quite perfect for parsing out the total characters that are relevant, the proximity of those values may not be a coincidence.

This would likely require more debugging -- I suspect we'd probably just end up raising a Warning (since it is perhaps more of a VMD issue) of some sort if there's no obvious fix to the load-in string length cap, and this may be isolated to the GUI window load-in situation only. Or maybe even just a documentation note of the possible limitation.

I also note that:

set sel [atomselect top mdanalysis001]
$sel num

gives the correct value -- so the issue appears to primarily be related to the Representations window handling of the massive string loaded in the manner instructed by the MDA docs.

Here are some qualitative comparisons of the same atomgroup loaded via a MDA-written .gro file vs. MDA-written vmd selection macro on a large vesicle:

selection macro:

outer_leaflet_macro

written coord file:

outer_leaflet

orbeckst commented 6 years ago

Can you inquire on the VMD list if there are known limitations to the selection macros?

It would be nice if we could compact the selection macros, e.g, transform "index 1 2 3 4 5 6 8 9 10" to "index 1 to 10" or if you write from a ResidueGroup, do a "resid 1 2 3" selection instead of the AtomGroup "index 1 2 ...".

orbeckst commented 5 years ago

There should be an algorithm to "compact selections", especially for hierarchical systems that can be represented as a tree: If you have selected all leaves (atoms) on a branch, then you can just select the branch (residue). Similarly, if you selected all residues then you could just select the segment. If our topology is represented as a tree graph, then this might be pretty straightforward.

At more fine grained level, replacing sequences "1 2 3 ... 1000" with "1 to 1000" will also help tremendously.