NMRLipids / Databank

NMRlipids databank
GNU General Public License v3.0
3 stars 29 forks source link

Should we provide all trajectories having starting timestep to be zero? #195

Open comcon1 opened 2 months ago

comcon1 commented 2 months ago

Now when we go through the trajectory using MDAnalysis, we use TIMELEFTOUT starting from 0 time. And trajectory can have starting time 120 ns for example. And TIMELEFTOUT 50 means that it will actually skip nothing. Is that correct behavior? Should we check then that TIMELEFTOUT is larger than zero timestep?

I offer just checking starting time to be zero inside AddFile script. That will fix at least future trajectories.

ohsOllila commented 2 months ago

In which code exactly is this problem?

comcon1 commented 2 months ago

I saw that in scattering code, but I suppose it could be in many parts. That is not a problem actually, it's more like an uncertainty in the behavior.

batukav commented 2 months ago

I think we can fix this issue by making use of the PREEQTIME. It looks like we're actually not using this variable.

For any trajectory, the first timestep in the analysis should be PREEQTIME + TIMELEFTOUT

In calc_FormFactors.py (actually in all calculation scripts) https://github.com/NMRLipids/Databank/blob/b2a656c2d0c06bbd1b2f8189f4833e2734114a1d/Scripts/AnalyzeDatabank/calc_FormFactors.py#L171

setting

EQtime=float(system['TIMELEFTOUT'] + system['PREEQTIME])*1000 should do the trick. We'll need to rerun the entire databank, though.

comcon1 commented 2 months ago

PREEQTIME can not be the starting point of the trajectory. I can do the equilibration during 300 ns first and then make new MD run starting from zero. Then PREEQTIME I will anyway set to 300 to inform a user that I did equilibrate it for a long. There are three possibilities:

That is the choice, I suppose.

batukav commented 2 months ago

Your example makes sense. I’d prefer your third suggestion. We could set the Eq time as the (first_time_stamp + TIMELEFTOUT).

On 29. Jun 2024, at 15:09, Alexey Nesterenko @.***> wrote:

PREEQTIME can not be the starting point of the trajectory. I can do the equilibration during 300 ns first and then make new MD run starting from zero. Then PREEQTIME I will anyway set to 300 to inform a user that I did equilibrate it for a long. There are three possibilities:

we ask users to always start trajectory from 0 we ask users to always start trajectory from PREEQTIME we ignore starting time of the trajectory and calculate TIMELEFTOUT from starting trajectory time That is the choice, I suppose.

— Reply to this email directly, view it on GitHub https://github.com/NMRLipids/Databank/issues/195#issuecomment-2198129307, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB47SNNPBHTAUHEQBTMFWNLZJ2PXDAVCNFSM6AAAAABJUKDKCSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJYGEZDSMZQG4. You are receiving this because you commented.

comcon1 commented 2 months ago

Ok, let's do the third one. I like it better also. You can assign it to me.

comcon1 commented 1 month ago

I found the problem that some of systems cannot be recomputed. For example many of united-atom systems contain definition, which buildh doesn't know. Probably they were added to buildH locally but were never synchronized with buildh github. Probably, there exist other problems so I'm now quite skeptical regarding a possibility of recomputing the entire databank. Probably, I can do this issue without recomputing?

batukav commented 1 month ago

Could you please post the output for some of the united-atom systems that you couldn't recompute? As a test, can you recompute the same united-atom system with the code you haven't modified?

comcon1 commented 1 month ago

Could you please post the output for some of the united-atom systems that you couldn't recompute? As a test, can you recompute the same united-atom system with the code you haven't modified?

I answered but then I found my problem and deleted last post. I found that I accidentaly broke path to lipid_json_buildh folder which contains dictionaries locally in NMRlipids and that's why I experienced these problems. Now seems that everything is OK. So that was misreporing from my side..

batukav commented 1 month ago

okay, that's relieving. so you're done with implementing the change and now just recomputing the systems with the new code, right?

comcon1 commented 3 weeks ago

It's not a problem to fix this issue. It's a problem to recompute the databank. I would like to do it after I'm ready with my current code optimizing and testing. Once I'm ready to start recomputing, I will do this issue. Also, I would probably want to ensure everyone to separate the Data into a GitHub submodule to have an independent commit history for the code and the data. I had a talk about it with @ohsOllila, @fsuarezleston and @markussmiettinen when Samuli was visiting us in Bergen this summer. Note, that databank recomputing will touch huge amount of files. Now, for example, almost all functions writes JSON in more human-readable format (not in one-line), so even having the same numbers, JSONs will change. Probably it's a good time to split up the repository exactly at the point of recomputing.

Probably it's actually better for me to FIX this issue after I make tests, to cover this fix with tests and then just wait until everything will be ready for the recomputing...

batukav commented 3 weeks ago

It makes a lot of sense to separate data and scripts. I'll create a separate issue to discuss this.