Corrupt xvgs check slows down gromacs parser and makes unnecessary backups

nathanmlim commented 8 years ago

The addition of checking for corrupt xvgs slows down the gromacs parser quite a bit which may or may not be related to the use of numpy.genfromtxt? It might be better/faster to iterate over the lines and check for lines with different numbers of elements.

In the gromacs parser, prior to the call removeCorruptLines(), I had it make a backup of the xvgs in a separate folder. This is not always totally necessary in the case where there are no corrupt lines in the xvgs. Instead, it should only make backups of offending xvgs.

davidlmobley commented 8 years ago

Maybe run a bit of benchmarking?

Also, one fix could be to make the handling of corrupt xvgs be optional, or possible to be bypassed... In some applications speed will be important.

nathanmlim commented 8 years ago

I'm not sure how best to handle this, since this feature is currently gromacs specific which would make adding something like a feature flag...maybe a bit unnecessary? Could be the best way to handle this is to generalize the corruption check and then add a feature flag for it.

davidlmobley commented 8 years ago

Hannes has been arguing that we ought to have a way to pass parser-specific options (he has a proposal for how to do it). This is another good argument for that. Generalizing the corruption check to other data formats would be a pain.

I'd fix the backup issue at this point, and then do the benchmarking to see how much of a speed hit this is versus sensible alternatives. We can leave the "making it optional" aspect for later.

On Wed, Feb 24, 2016 at 4:39 PM, Nathan Lim notifications@github.com wrote:

I'm not sure how best to handle this, since this feature is currently gromacs specific which would make adding something like a feature flag...maybe a bit unnecessary? Could be the best way to handle this is to generalize the corruption check and then add a feature flag for it.

— Reply to this email directly or view it on GitHub https://github.com/MobleyLab/alchemical-analysis/issues/81#issuecomment-188535633 .

David Mobley dmobley@gmail.com 949-385-2436

MobleyLab / alchemical-analysis

Corrupt xvgs check slows down gromacs parser and makes unnecessary backups #81