Closed nathanmlim closed 8 years ago
Maybe run a bit of benchmarking?
Also, one fix could be to make the handling of corrupt xvgs be optional, or possible to be bypassed... In some applications speed will be important.
I'm not sure how best to handle this, since this feature is currently gromacs specific which would make adding something like a feature flag...maybe a bit unnecessary? Could be the best way to handle this is to generalize the corruption check and then add a feature flag for it.
Hannes has been arguing that we ought to have a way to pass parser-specific options (he has a proposal for how to do it). This is another good argument for that. Generalizing the corruption check to other data formats would be a pain.
I'd fix the backup issue at this point, and then do the benchmarking to see how much of a speed hit this is versus sensible alternatives. We can leave the "making it optional" aspect for later.
On Wed, Feb 24, 2016 at 4:39 PM, Nathan Lim notifications@github.com wrote:
I'm not sure how best to handle this, since this feature is currently gromacs specific which would make adding something like a feature flag...maybe a bit unnecessary? Could be the best way to handle this is to generalize the corruption check and then add a feature flag for it.
— Reply to this email directly or view it on GitHub https://github.com/MobleyLab/alchemical-analysis/issues/81#issuecomment-188535633 .
David Mobley dmobley@gmail.com 949-385-2436
The addition of checking for corrupt xvgs slows down the gromacs parser quite a bit which may or may not be related to the use of numpy.genfromtxt? It might be better/faster to iterate over the lines and check for lines with different numbers of elements.
In the gromacs parser, prior to the call removeCorruptLines(), I had it make a backup of the xvgs in a separate folder. This is not always totally necessary in the case where there are no corrupt lines in the xvgs. Instead, it should only make backups of offending xvgs.