Scripts and mini tools for MDAnalysis

lilyminium commented 4 years ago

Is your feature request related to a problem? Please describe. It is likely that many users are just using MDAnalysis to convert trajectory formats,(comment) and learning how to use MDAnalysis just for that is overkill.

Describe the solution you'd like Mini scripts and tools would be nice.

Describe alternatives you've considered MDAnalysis could just point people to relevant tutorials / other tools like mdconvert.

Additional context Other ideas:

replacement for gmx make_ndx
script to make VMD macros

orbeckst commented 4 years ago

New repository/package! Create a repo with examples. You should have permission to create a new one in the org.

Basically anything like this should be its own projects to decouple it from the (slow) MDAnalysis development workflow and the glacial CI. Also, its invigorating to build a new project up, free from the chains of CI and strict code review ;-). (That said, testing is still an excellent idea...)

orbeckst commented 4 years ago

Btw, in the days when we were on SVN on GoogleCode, we also had an "apps" subdirectory where things like RotamerConvolveMD lived. The decision was made to make all "applications" separate repos.

Perhaps create a MolSSI cookie-cutter-cms template for MDAnalysis applications?

orbeckst commented 4 years ago

For VMD macros: This is what happens when you use the VMD selection exporter.

orbeckst commented 4 years ago

Would this be a simple GSoC project? It would mostly require someone to come up with ideas and then set up the repo and write scripts and docs. It would be a self-contained project. If we want to list it as a project we should have a few more ideas for specific scripts.

There's also @mnmelo 's mdreader which also provides a framework for script writing, including multicore parallelization.

cc @IAlibay @richardjgowers

orbeckst commented 4 years ago

I added the label as a potential GSoC project but I am not 100% sold so I haven't added anything to the project page. If anyone has strong feelings either way, feel free to remove the label or add it to the project page!

PicoCentauri commented 4 years ago

Currently we are developing an analysis library for the command line based on the AnalysisBase class which builds a CLI interface on the fly.

So I would be happy to help on this project and it would be nice to have some command line scripts based on MDAnalysis.

orbeckst commented 4 years ago

@PicoCentauri cool! (Btw, is MAICoS listed already under https://www.mdanalysis.org/pages/used-by/ ? If not, feel free to open a PR in https://github.com/MDAnalysis/MDAnalysis.github.io to add it!)

If you were interested to mentor on such a project then please add a project description to https://github.com/MDAnalysis/mdanalysis/wiki/Project-Ideas-2020 and also add your GH handle to the project as a mentor.

Once we have projects, we will add all mentors to https://github.com/MDAnalysis/mdanalysis/wiki/Google-Summer-Of-Code.

PicoCentauri commented 4 years ago

@PicoCentauri cool! (Btw, is MAICoS listed already under https://www.mdanalysis.org/pages/used-by/ ? If not, feel free to open a PR in https://github.com/MDAnalysis/MDAnalysis.github.io to add it!)

@orbeckst Thanks 😊. I will open an PR

If you were interested to mentor on such a project then please add a project description to https://github.com/MDAnalysis/mdanalysis/wiki/Project-Ideas-2020 and also add your GH handle to the project as a mentor.

I think mentoring such a project would be nice and a good extension.

richardjgowers commented 4 years ago

I think a CLI tool would be a cool gsoc project. Initially just doing something like mdconvert (or babel) but also putting in wrapping/unwrapping and selections. I think there'll have to be some care in what we wrap, I wouldn't want to expose a good tool badly (because of limitations of CLI).

@lilyminium we did try and have a cookbook repo at some point, which would be small scripts for doing common tasks, that hopefully can be used as building blocks for larger things. Maybe this needs the dust blowing off it.

pr4k commented 4 years ago

Hi Everyone, I came across this in Gsoc - 2020 organizations, and I want to contribute in this issue, can anyone guide me to the starters or bare essentials, @PicoCentauri @orbeckst @fiona-naughton

PicoCentauri commented 4 years ago

Hi Everyone, I came across this in Gsoc - 2020 organizations, and I want to contribute in this issue, can anyone guide me to the starters or bare essentials,

@pr4k: cool that you want to get involved into the project. I would suggest to starrt with a simple script that reads a trajectory in a certain format and can convert into another one. The user should be able to give the start the end end the step for writing. Nice would also to have an option to only write a certain subset of the atoms in the trajectory. For a start I suggest the MDAnalysis Tutorial mainly sections 6 and 7.

Hey , I would like to build this command line application, using the existing functions in MDAnalysis Package , I would like to use this https://click.palletsprojects.com/en/7.x/ Python package to do build this, would like to show you people a demo by this week. Cheers

@Sravanksk: Why not simply use argparse? I quickly looked into the click package and it looks comfortable but this is another dependency and I'm not sure if this is necessary.

orbeckst commented 4 years ago

This has become a GSoC project (so not a starter anymore).

richardjgowers commented 4 years ago

I came across this which is relevant: https://github.com/joaomcteixeira/taurenmd

I've not had time to thoroughly look at it yet.

orbeckst commented 4 years ago

I wasn't aware of @joaomcteixeira 's project – sorry! It seems to be doing a lot of what we put down here. It uses MDAnalysis and mdtraj for various trajectory editing and analysis tasks.

There's a description for how to write new functionality, "clients".

@PicoCentauri would you mind looking a bit more into taurenmd and then make the case for or against continuing with a MDA-specific CLI tool?

joaomcteixeira commented 4 years ago

Hi @orbeckst and @richardjgowers

I have to say I was also not aware of this issue and the proposal of a MDA-specific CLI tool for GSOC. Quite an interesing and motivating coincidence.

I've been working quite intensively with MDAnalysis in recent months, I even did a couple of participations in PR here #2357 and #2406. As I used MDAnalysis (and others) to analyze my data, I started developing taurenmd as a structured project to automatize, at least, the most mainstream operations. And recently I gave it a push to organize the whole repository and present it for publication (just submitted to JOSS).

I had definitely though to present it to you, and to the whole MDAnalysis community, I was looking for the correct moment to do it, I guess the moment just came :smile:

I have complied to the most of my knowledge with the best CI practices: CI itself, documentation, testing, and deployment. Also, to readability, modularity and extensibility. And, specially, to sensitize the users to properly cite taurenmd's dependencies.

I really look forward your feedback, indeed! And all discussions towards bringing taurenmd and MDAnalysis closer together.

Cheers,

PicoCentauri commented 4 years ago

Dear @joaomcteixeira and @orbeckst ,

I just took a look into taurenmd. It looks like a really nice package. Well written, modular, with tests and good documentation. After looking into the code I'm not sure if one should really start writing MDA-specific CLI...

The advantages for a MDA CLI are:

probably better maintainability due to the higher number of maintainers
easier start for new users since they don't need two packages with two documentations

Disadvantages:

Probably a lot of duplicated code

Maybe merging taurenmd into MDA could be an option but this definitely something for the coredevs to discuss.

orbeckst commented 4 years ago

Thanks @PicoCentauri ! I read the JOSS draft https://github.com/joaomcteixeira/taurenmd/blob/master/joss/paper.md and I generally agree with the rationale there. One advantage of taurenmd from a user's perspective is that it is not tied to a single library such as MDAnalysis. (MDAnalysis is becoming more interoperable with the new convertors but that's not the same yet as using mdtraj for one thing and MDAnalysis for another.)

One advantage of a MDA-native CLI would be a way to leverage the common analysis interface to more or less automatically generate scriptlets (at least for the classes that support it). Then the MDA CLI project would be more about implementing a common framework to turn any AnalysisBase-based analysis into a script than focusing on the CLI. This framework could conceivably then be used by tools such as taurenmd as well. One could also write separate Analysis-like classes for trajectory manipulation, i.e., a class that writes a new trajectory. One could even use on-the-fly transformations as a general way to make this work...

I agree with @PicoCentauri that it would be easier for MDA-users if they could just stay within MDA. However, we are unlikely to make MDA CLI part of the standard library. This would be a separate package from a separate repository within the MDAnalysis org. However, it is likely going to be easier to get users to try an MDA CLI when it already comes under the same brand. For taurenmd it might at least be initially an uphill battle to attract users. MDAnalysis already has a certain brand recognition so it would be easier to get users interested.

orbeckst commented 4 years ago

Remarks on JOSS manuscript

Comments to @joaomcteixeira (as I just read your manuscript – not as a referee, though)

I'd like to see a table of the available functionality/subcommands with quick description
I also find pip install taurenmd a bit too glib: good luck installing OpenMM, MDAnalysis and mdtraj on a random unexperienced user's system. If the target audience consists of the users who want to use canned commandline tools then they might not know how to build reasonably complicated python packages with many dependencies. I would encourage conda use.
I would point out that LOOS already has about 100 "single action tools" (check their docs or ask @agrossfield), Gromacs does something similar (and it can use VMD's molfile plugin).)

hhaootian commented 4 years ago

Hello GSOC mentors, I'm interested in this project and my draft of proposal can be found here, any comments would be appreciated!

One advantage of a MDA-native CLI would be a way to leverage the common analysis interface to more or less automatically generate scriptlets (at least for the classes that support it). Then the MDA CLI project would be more about implementing a common framework to turn any AnalysisBase-based analysis into a script than focusing on the CLI.

@orbeckst I don't quite understand the underlying coding work. Will that be designing a framework to write functions for specific AnalysisBase analysis, provide input args, and wrap them into taurenmd? Btw, silly question, where is AnalysisBase class, I didn't find it, and where is AnalysisBase API?...😅

joaomcteixeira commented 4 years ago

Dear @orbeckst and @PicoCentauri, Thanks very much for you positive feedback. I really appreciate your comments!

I will try to address all your points:

I designed taurenmd to operate with any available Python-interfaced library, but mostly pure Python libraries. This design addresses current libraries and facilitates new additions in the future. However, it was not my aim to purposely create a platform that can execute all operations of all MD analysis libraries available. Unavoidably, taurenmd implementation reflects usage ratios of the different libraries and, in this early stages of development, it reflects mostly my usage.
I do favour the usage of MDAnalysis though. Is the library I most use of my work. Currently, the only implementations in taurenmd that use mdtraj are for the image_molecules function, and OpenMM to read .cif files. The unwrap of MDAnalysis is also implemented.
1. I also favoured MDAnalysis because it is arguably the one being more actively developed, and that is to be acknowledged also.
Except in isolated situations, I do not intend to write pure data reading or analysis routines in taurenmd, instead just use those available in the libraries. If MDAnalysis develops more advanced analysis classes, or interfaces, they could be used directly by taurenmd as as vehicle to make their usability command-line available.
Regarding the number of developers in taurenmd, I can eventually create an organisation for the project instead of hosting it under my username. This would facility the creation of a developers community. There is no problem in doing that.
I implemented in taurenmd a citation log system where after each run a .citation file is recorded with all the references that need to be cited. References are also described in the -h menu of each command line. An user who only wants to use MDAnalysis operations, for example, can do so conscientiously. I have also stated in several places in the documentation that users should always cite also the dependencies taurenmd uses.
Merging points 4 and 5, taurenmd can be a door for new users to use MDAnalysis, or advanced users to get high throughput. taurenmd can be way to also attract more users to MDAnalysis, I see many questions raised in the users mailing list that could just be easily implemented in taurenmd.
Attract new users is always difficult, indeed. But, in this case, because taurenmdis a door to the usage of other libraries, those libraries communities really play a role in disseminating tools like taurenmd. taurenmd could be listed in the tools that use MDAnalysis in MDAnalysis site, and users could be directed to taurenmd when they raise 'HowToDo' questions already addressed by taurenmd. I have seen many colleagues who do use MD simulations for their research but lack coding skills to fully operate libraries such as MDAnalysis. taurenmd is a door for them to use such libraries - yet again, almost all implementation are done with MDAnalysis at the current stage. This definitively doesn't avoid or hinders any user to actually script with the analysis library itself.
1. MDAnalysis and taurenmd could have a tight collaboration in these lines.
However, I do not for foresee taurenmd to be completely absorbed by MDAnalysis because of the nature of the former. By definition, it shouldn't be tight to a specific library. This doesn't imply a straight collaboration can't be forged. These are my thoughts for now, but these aren't rigid though.

Answering @orbeckst comments on JOSS paper proposal:

First of all, thanks for you input indeed!

Currently that table is available with taurenmd -h command. But I will reproduced it also in the documentation!
Deciding on the installation approach was not easy at all. I have detailed it to my best in the Installation instructions. It currently goes as follows:
1. To get all taurenmd functionalities users should use Anaconda. I provide the resources to proceed with that installation (environment, commands, etc...). Dependencies are installed via conda, taurenmd itself is installed via pip then.
2. installing dependencies by PyPI is also possible. The user should use pip3 install taurenmd[all]. This install taurenmd itself and MDAnalysis and mdtraj. As far as I could test, it always installed properly. OpenMM is not installed with a pure PyPI installation process, because OpenMM is not deployed in PyPI. However, OpenMM is a run time dependency of taurenmd and is only used to read .cif files. I am really looking forward for MDAnalysis to implement cif reader!
3. By running only pip3 install taurenmd, the MD analysis libraries are not installed! I defined this way because I assumed many possible taurenmd users already have their own installations of MDAnalysis and others.
4. I addressed all this points in the installation page on the documentation. Thanks very much for you feedback and I really appreciate any additional comments.
Yes. LOOS and GROMACS have also their command-line interfaces. I haven't code for those because they are not written purely in Python and was not my aim to provide interfaces for all available libraries, it would also be a reproduction of efforts in the case of LOOS and GROMACS.

I look forward more thoughts and discussions,

Cheers!

orbeckst commented 4 years ago

@joaomcteixeira thanks for the insights. Before I forget: please create a PR to add taurenmd to the used-by page on the website, https://github.com/MDAnalysis/MDAnalysis.github.io/issues/125. That might also help with users.

orbeckst commented 4 years ago

@HTian1997 , sorry, we weren't aware of taurenmd when we proposed the MDA CLI project. Right now this is a discussion if it's worthwhile going further with the MDA CLI project or remove it to avoid duplication of effort.

Regarding your question https://github.com/MDAnalysis/mdanalysis/issues/2377#issuecomment-596882680

I don't quite understand the underlying coding work. Will that be designing a framework to write functions for specific AnalysisBase analysis, provide input args, and wrap them into taurenmd?

We don't propose at the moment that a student would work directly on taurenmd as it is not part of the MDAnalysis org.

@PicoCentauri should give his thoughts because he already did something related in maicos for specific functions. I'll share my thoughts in a new comment.

Btw, silly question, where is AnalysisBase class, I didn't find it, and where is AnalysisBase API?...😅

Look into https://www.mdanalysis.org/docs/documentation_pages/analysis/base.html for how to create new analysis classes based on AnalysisBase. Have a look at some of the existing ones (e.g., rms.RMSD, rms.RMSF, contacts.ContactAnalysis, etc).

(There's a long discussion in #719 that lead to AnalysisBase... )

orbeckst commented 4 years ago

Thoughts on the MDA CLI project, @MDAnalysis/gsoc-mentors and everyone else in this thread please chime in:

We don't want to duplicate effort. The original proposal for MDA CLI pretty much duplicates taurenmd, which, according to @PicoCentauri 's analysis https://github.com/MDAnalysis/mdanalysis/issues/2377#issuecomment-596408623 is already doing many things right. Thus, I don't want to recommend to a GSoC student to work on a project where (1) they will have a hard time catching up to existing work, (2) they spend a lot of time duplicating existing work. That's not good use of student and mentor time.

Option 1: drop the project

We can decide to withdraw the project.

Option 2: change focus towards interoperability

Recently, we have been making more and more effort to be interoperable (e.g., the new convertors). Perhaps we can make it easier for downstream tool writers such as maicos and taurenmd to make use of our analysis classes.

What I have in mind is a way for external code to automatically discover all analysis classes that have a common user interface so that a user can select and run whichever one they like. This should work similar to how pytest autodetects all tests that it needs to run. In the same way, there should be a way to find all analysis classes in MDAnalysis.analysis and then run any of these classes with user input. In this way, no-one needs to keep a list of the available analysis classes: they are just available and when a new class is added in a new release, it will just show up. This might require additions to how we write our analysis classes, e.g., by adding a new attribute that signifies that the class can be used by a CLI tool. I assume that this approach would leverage base.AnalysisBase as the common interface, with additions.

This GSoC project would then include

Define how analysis classes can be collected. How should this be done? (How do other tools such as pytest do this? Should collection be based on files, metaclasses and registration (like our file formats), importing and analyzing the classes, ....?)
What technical changes are needed? (What needs to be added to existing code? )
Document the API so that it is useful to others – ultimately we want others to use the approach.

Implement a bare-bones CLI tool to demonstrate the functionality.

$ mdacli --help

Simple proof-of-concept CLI tool:

 mdacli <tool> <tool-options>  TOPOLOGY [TRAJECTORY ...]    

 Load system from TOPOLOGY and TRAJECTORY and create the master atomgroup
 using selection --atom-group.

 Execute MDAnalysis.analysis class <tool> with the <tool-options>. If an atomgroup is
 required, use the master atom-group, denoted <AG> below.

$ mdacli --help-analysis
Available analysis:

RMSF    <AG> [--verbose] [--start=None] [--stop=None] [--step=None]
RMSD    <AG> [--reference=?] [--select=all] [--groupselections=None] [--weights=None] [--tol_mass=01] [--ref_frame=0]
AlignTraj <AG> REFERENCE [--select=all] [--filename=None] ....
....

See analysis.rms and align.AlignTraj (I don't know how to do all this or what it would look like in detail... for instance, how does one add the reference groups... but part of the project would be figuring out problems like these)

The difficulty level would increase to medium/challenging.

@PicoCentauri and @joaomcteixeira do you have specific opinions?

PicoCentauri commented 4 years ago

A framework for the base.AnalysisBase to let downstream projects detect them and probably create a CLI sounds like a really good idea. This is also where I spend most of the time and effort while writing maicos. @joaomcteixeira probably as well.

If this framework exists it becomes more attractive for users and new developers to write their analysis using the base.AnalysisBase. Regularly, I'm facing the problem in our lab that new students/scientists start writing their analysis scripts facing the same challenges and problems i.e.

How to initialize the universe and loop through frames without copying many lines of code?
How to write a CLI parser to analyze several of their simulations?
How to process and save their trajectories in a clever way?
...

Some of these problems can be easily solved by using the base.AnalysisBase class. However, this class could be even more powerful and helpful to write day to day analysis. An extension of the class is also in the spirit of MDAnalysis. Help people to analyze their simulation data with the least effort.

But as @orbeckst sketched the extension of the base.AnalysisBase can be really hard. Especially when it comes to creating a useful CLI framework on the fly.

tl;dr Lets do Option 2, even if it could be challenging!

joaomcteixeira commented 4 years ago

As both of you have said, Option 2 sounds a great approach to facilitate both the usage of MDAnalysis analysis frameworks and the incorporation of those by third-party projects.

Following your thoughts, I can envision implementing an automatic command-line maker that would just convert the new MDAnalysis analysis frameworks into taurenmd command lines, provided the former are identified.

This, by any means, wouldn't hinder the development of independent command-line interfaces for specific and more complex workflows as usual.

Analysis frameworks usable directly in command lines could be tagged via a decorator pattern that would collect all frameworks definitions. For example, isort project uses this kind of implementation.

At the level of the third-party software, the inspect could be used to convert framework parameters into argparse arguments automatically. To pipe multiple command-lines, we can pass the outputs (AtomGroups, arrays, etc...) via some sys variable (pickle to disk, memory, or create a master cmd api framework that handles sub cmd apis).

orbeckst commented 4 years ago

@PicoCentauri and @joaomcteixeira thank you for your input, this sounds encouraging and gives me confidence to move forward with Option 2.

@PicoCentauri could you please update the Project 2 description and then send an email to the developer list, letting everyone know that the aims and scope of Proposed Project 2 have changed?

EDIT: Please also change the title of this issue and update the text so that the goal becomes clearer.

orbeckst commented 4 years ago

@joaomcteixeira would you be interested in co-mentoring a GSoC student if we selected one to work on this project? (~@joaomcteixeira~ @PicoCentauri @fiona-naughton and I would also be available for co-mentoring but it would clearly be very helpful if you could bring your expertise to the table)

EDIT: added @PicoCentauri instead of @joaomcteixeira – apologies

joaomcteixeira commented 4 years ago

@orbeckst sorry for the late reply, I couldn't look carefully before because of the events these days.

I am pleased to participate as a co-mentoring in this project together with you and @fiona-naughton (I think you reused twice my user name in your comment by mistake and wanted to mention someone else). I am grateful for your appreciation, indeed! Just let me know about any further details.

hhaootian commented 4 years ago

Hi GSoC mentors @orbeckst @joaomcteixeira @fiona-naughton @PicoCentauri , here is my draft proposal for this command line interface project. Any comment is appreciated! Thanks!

orbeckst commented 4 years ago

I left a few comments on your draft.

On Mar 17, 2020, at 8:36 PM, Hao Tian notifications@github.com wrote:

Hi GSoC mentors @orbeckst https://github.com/orbeckst @joaomcteixeira https://github.com/joaomcteixeira @fiona-naughton https://github.com/fiona-naughton , here is my draft https://docs.google.com/document/d/1ZRCu3mT3_dLrkslWDd6t5MHJmgvfLEkyKrK69lMCVP8/edit?usp=sharing for this command line interface project. Any comment is appreciated! Thanks!

joaomcteixeira commented 4 years ago

Hi @HTian1997 @orbeckst I added some comments to draft. Cheers!

PicoCentauri commented 4 years ago

@HTian1997 thanks for your proposal! I also took a quick look.

orbeckst commented 3 years ago

@joaomcteixeira @PicoCentauri I assigned you to this issue. Once you make mdacli public, close this issue.

germanbarcenas commented 3 years ago

Hello! I was at your seminar earlier today. I heard about mdtraj as something that was being considered and I wanted to explain a little further/support this idea. It seems that people use MDA to interconvert trajectory formats from this thread, which mdtraj seems built to handle. I mostly am interested in performing some Markov State analysis using http://emma-project.org/latest/ and/or http://msmbuilder.org/3.8.0/. I hanve't tried using MSM, but I do know that pyemma uses mdtraj as the base. Just though I'd boost interest in mdtraj (:

PicoCentauri commented 2 years ago

With the release of mdacli I will close this issue.

@germanbarcenas. A simple converter is now easy to implement. If you would like to to give us a hand get in contact with us via discord or the mailing list.

germanbarcenas commented 2 years ago

Ok I'll put this on my todo list(: I'll be in touch soon,

MDAnalysis / mdanalysis