Closed lilyminium closed 2 years ago
New repository/package! Create a repo with examples. You should have permission to create a new one in the org.
Basically anything like this should be its own projects to decouple it from the (slow) MDAnalysis development workflow and the glacial CI. Also, its invigorating to build a new project up, free from the chains of CI and strict code review ;-). (That said, testing is still an excellent idea...)
Btw, in the days when we were on SVN on GoogleCode, we also had an "apps" subdirectory where things like RotamerConvolveMD lived. The decision was made to make all "applications" separate repos.
Perhaps create a MolSSI cookie-cutter-cms template for MDAnalysis applications?
For VMD macros: This is what happens when you use the VMD selection exporter.
Would this be a simple GSoC project? It would mostly require someone to come up with ideas and then set up the repo and write scripts and docs. It would be a self-contained project. If we want to list it as a project we should have a few more ideas for specific scripts.
There's also @mnmelo 's mdreader which also provides a framework for script writing, including multicore parallelization.
cc @IAlibay @richardjgowers
I added the label as a potential GSoC project but I am not 100% sold so I haven't added anything to the project page. If anyone has strong feelings either way, feel free to remove the label or add it to the project page!
Currently we are developing an analysis library for the command line based on the AnalysisBase
class which builds a CLI interface on the fly.
So I would be happy to help on this project and it would be nice to have some command line scripts based on MDAnalysis.
@PicoCentauri cool! (Btw, is MAICoS listed already under https://www.mdanalysis.org/pages/used-by/ ? If not, feel free to open a PR in https://github.com/MDAnalysis/MDAnalysis.github.io to add it!)
If you were interested to mentor on such a project then please add a project description to https://github.com/MDAnalysis/mdanalysis/wiki/Project-Ideas-2020 and also add your GH handle to the project as a mentor.
Once we have projects, we will add all mentors to https://github.com/MDAnalysis/mdanalysis/wiki/Google-Summer-Of-Code.
@PicoCentauri cool! (Btw, is MAICoS listed already under https://www.mdanalysis.org/pages/used-by/ ? If not, feel free to open a PR in https://github.com/MDAnalysis/MDAnalysis.github.io to add it!)
@orbeckst Thanks 😊. I will open an PR
If you were interested to mentor on such a project then please add a project description to https://github.com/MDAnalysis/mdanalysis/wiki/Project-Ideas-2020 and also add your GH handle to the project as a mentor.
I think mentoring such a project would be nice and a good extension.
I think a CLI tool would be a cool gsoc project. Initially just doing something like mdconvert (or babel) but also putting in wrapping/unwrapping and selections. I think there'll have to be some care in what we wrap, I wouldn't want to expose a good tool badly (because of limitations of CLI).
@lilyminium we did try and have a cookbook repo at some point, which would be small scripts for doing common tasks, that hopefully can be used as building blocks for larger things. Maybe this needs the dust blowing off it.
Hi Everyone, I came across this in Gsoc - 2020 organizations, and I want to contribute in this issue, can anyone guide me to the starters or bare essentials, @PicoCentauri @orbeckst @fiona-naughton
Hi Everyone, I came across this in Gsoc - 2020 organizations, and I want to contribute in this issue, can anyone guide me to the starters or bare essentials,
@pr4k: cool that you want to get involved into the project. I would suggest to starrt with a simple script that reads a trajectory in a certain format and can convert into another one. The user should be able to give the start
the end
end the step
for writing. Nice would also to have an option to only write a certain subset of the atoms in the trajectory. For a start I suggest the
MDAnalysis Tutorial mainly sections 6 and 7.
Hey , I would like to build this command line application, using the existing functions in MDAnalysis Package , I would like to use this https://click.palletsprojects.com/en/7.x/ Python package to do build this, would like to show you people a demo by this week. Cheers
@Sravanksk: Why not simply use argparse
? I quickly looked into the click package and it looks comfortable but this is another dependency and I'm not sure if this is necessary.
This has become a GSoC project (so not a starter anymore).
I came across this which is relevant: https://github.com/joaomcteixeira/taurenmd
I've not had time to thoroughly look at it yet.
I wasn't aware of @joaomcteixeira 's project – sorry! It seems to be doing a lot of what we put down here. It uses MDAnalysis and mdtraj for various trajectory editing and analysis tasks.
There's a description for how to write new functionality, "clients".
@PicoCentauri would you mind looking a bit more into taurenmd and then make the case for or against continuing with a MDA-specific CLI tool?
Hi @orbeckst and @richardjgowers
I have to say I was also not aware of this issue and the proposal of a MDA-specific CLI tool for GSOC. Quite an interesing and motivating coincidence.
I've been working quite intensively with MDAnalysis in recent months, I even did a couple of participations in PR here #2357 and #2406. As I used MDAnalysis (and others) to analyze my data, I started developing taurenmd as a structured project to automatize, at least, the most mainstream operations. And recently I gave it a push to organize the whole repository and present it for publication (just submitted to JOSS).
I had definitely though to present it to you, and to the whole MDAnalysis community, I was looking for the correct moment to do it, I guess the moment just came :smile:
I have complied to the most of my knowledge with the best CI practices: CI itself, documentation, testing, and deployment. Also, to readability, modularity and extensibility. And, specially, to sensitize the users to properly cite taurenmd's dependencies.
I really look forward your feedback, indeed! And all discussions towards bringing taurenmd and MDAnalysis closer together.
Cheers,
Dear @joaomcteixeira and @orbeckst ,
I just took a look into taurenmd
. It looks like a really nice package. Well written, modular, with tests and good documentation. After looking into the code I'm not sure if one should really start writing MDA-specific CLI...
The advantages for a MDA CLI are:
Disadvantages:
Maybe merging taurenmd
into MDA could be an option but this definitely something for the coredevs to discuss.
Thanks @PicoCentauri ! I read the JOSS draft https://github.com/joaomcteixeira/taurenmd/blob/master/joss/paper.md and I generally agree with the rationale there. One advantage of taurenmd
from a user's perspective is that it is not tied to a single library such as MDAnalysis. (MDAnalysis is becoming more interoperable with the new convertors but that's not the same yet as using mdtraj for one thing and MDAnalysis for another.)
One advantage of a MDA-native CLI would be a way to leverage the common analysis interface to more or less automatically generate scriptlets (at least for the classes that support it). Then the MDA CLI project would be more about implementing a common framework to turn any AnalysisBase-based analysis into a script than focusing on the CLI. This framework could conceivably then be used by tools such as taurenmd
as well. One could also write separate Analysis-like classes for trajectory manipulation, i.e., a class that writes a new trajectory. One could even use on-the-fly transformations as a general way to make this work...
I agree with @PicoCentauri that it would be easier for MDA-users if they could just stay within MDA. However, we are unlikely to make MDA CLI part of the standard library. This would be a separate package from a separate repository within the MDAnalysis org. However, it is likely going to be easier to get users to try an MDA CLI when it already comes under the same brand. For taurenmd
it might at least be initially an uphill battle to attract users. MDAnalysis already has a certain brand recognition so it would be easier to get users interested.
Comments to @joaomcteixeira (as I just read your manuscript – not as a referee, though)
pip install taurenmd
a bit too glib: good luck installing OpenMM, MDAnalysis and mdtraj on a random unexperienced user's system. If the target audience consists of the users who want to use canned commandline tools then they might not know how to build reasonably complicated python packages with many dependencies. I would encourage conda use. Hello GSOC mentors, I'm interested in this project and my draft of proposal can be found here, any comments would be appreciated!
One advantage of a MDA-native CLI would be a way to leverage the common analysis interface to more or less automatically generate scriptlets (at least for the classes that support it). Then the MDA CLI project would be more about implementing a common framework to turn any AnalysisBase-based analysis into a script than focusing on the CLI.
@orbeckst I don't quite understand the underlying coding work. Will that be designing a framework to write functions for specific AnalysisBase analysis, provide input args, and wrap them into taurenmd
? Btw, silly question, where is AnalysisBase class, I didn't find it, and where is AnalysisBase API?...😅
Dear @orbeckst and @PicoCentauri, Thanks very much for you positive feedback. I really appreciate your comments!
I will try to address all your points:
taurenmd
to operate with any available Python-interfaced library, but mostly pure Python libraries. This design addresses current libraries and facilitates new additions in the future. However, it was not my aim to purposely create a platform that can execute all operations of all MD analysis libraries available. Unavoidably, taurenmd
implementation reflects usage ratios of the different libraries and, in this early stages of development, it reflects mostly my usage.taurenmd
that use mdtraj
are for the image_molecules
function, and OpenMM
to read .cif
files. The unwrap
of MDAnalysis is also implemented.
taurenmd
, instead just use those available in the libraries. If MDAnalysis develops more advanced analysis classes, or interfaces, they could be used directly by taurenmd
as as vehicle to make their usability command-line available.taurenmd
, I can eventually create an organisation for the project instead of hosting it under my username. This would facility the creation of a developers community. There is no problem in doing that.taurenmd
a citation log system where after each run a .citation
file is recorded with all the references that need to be cited. References are also described in the -h
menu of each command line. An user who only wants to use MDAnalysis operations, for example, can do so conscientiously. I have also stated in several places in the documentation that users should always cite also the dependencies taurenmd
uses.taurenmd
can be a door for new users to use MDAnalysis, or advanced users to get high throughput. taurenmd
can be way to also attract more users to MDAnalysis, I see many questions raised in the users mailing list that could just be easily implemented in taurenmd
.taurenmd
is a door to the usage of other libraries, those libraries communities really play a role in disseminating tools like taurenmd
. taurenmd
could be listed in the tools that use MDAnalysis in MDAnalysis site, and users could be directed to taurenmd
when they raise 'HowToDo' questions already addressed by taurenmd
. I have seen many colleagues who do use MD simulations for their research but lack coding skills to fully operate libraries such as MDAnalysis. taurenmd
is a door for them to use such libraries - yet again, almost all implementation are done with MDAnalysis at the current stage. This definitively doesn't avoid or hinders any user to actually script with the analysis library itself.
taurenmd
could have a tight collaboration in these lines.taurenmd
to be completely absorbed by MDAnalysis because of the nature of the former. By definition, it shouldn't be tight to a specific library. This doesn't imply a straight collaboration can't be forged. These are my thoughts for now, but these aren't rigid though.Answering @orbeckst comments on JOSS paper proposal:
​ First of all, thanks for you input indeed!
taurenmd -h
command. But I will reproduced it also in the documentation!taurenmd
functionalities users should use Anaconda. I provide the resources to proceed with that installation (environment, commands, etc...). Dependencies are installed via conda, taurenmd
itself is installed via pip
then.PyPI
is also possible. The user should use pip3 install taurenmd[all]
. This install taurenmd
itself and MDAnalysis and mdtraj. As far as I could test, it always installed properly. OpenMM is not installed with a pure PyPI installation process, because OpenMM is not deployed in PyPI. However, OpenMM is a run time dependency of taurenmd
and is only used to read .cif
files. I am really looking forward for MDAnalysis to implement cif
reader!pip3 install taurenmd
, the MD analysis libraries are not installed! I defined this way because I assumed many possible taurenmd
users already have their own installations of MDAnalysis and others.I look forward more thoughts and discussions,
Cheers!
@joaomcteixeira thanks for the insights. Before I forget: please create a PR to add taurenmd to the used-by page on the website, https://github.com/MDAnalysis/MDAnalysis.github.io/issues/125. That might also help with users.
@HTian1997 , sorry, we weren't aware of taurenmd when we proposed the MDA CLI project. Right now this is a discussion if it's worthwhile going further with the MDA CLI project or remove it to avoid duplication of effort.
Regarding your question https://github.com/MDAnalysis/mdanalysis/issues/2377#issuecomment-596882680
I don't quite understand the underlying coding work. Will that be designing a framework to write functions for specific AnalysisBase analysis, provide input args, and wrap them into taurenmd?
We don't propose at the moment that a student would work directly on taurenmd as it is not part of the MDAnalysis org.
@PicoCentauri should give his thoughts because he already did something related in maicos for specific functions. I'll share my thoughts in a new comment.
Btw, silly question, where is AnalysisBase class, I didn't find it, and where is AnalysisBase API?...😅
Look into https://www.mdanalysis.org/docs/documentation_pages/analysis/base.html for how to create new analysis classes based on AnalysisBase. Have a look at some of the existing ones (e.g., rms.RMSD, rms.RMSF, contacts.ContactAnalysis, etc).
(There's a long discussion in #719 that lead to AnalysisBase... )
Thoughts on the MDA CLI project, @MDAnalysis/gsoc-mentors and everyone else in this thread please chime in:
We don't want to duplicate effort. The original proposal for MDA CLI pretty much duplicates taurenmd
, which, according to @PicoCentauri 's analysis https://github.com/MDAnalysis/mdanalysis/issues/2377#issuecomment-596408623 is already doing many things right. Thus, I don't want to recommend to a GSoC student to work on a project where (1) they will have a hard time catching up to existing work, (2) they spend a lot of time duplicating existing work. That's not good use of student and mentor time.
We can decide to withdraw the project.
Recently, we have been making more and more effort to be interoperable (e.g., the new convertors). Perhaps we can make it easier for downstream tool writers such as maicos and taurenmd to make use of our analysis classes.
What I have in mind is a way for external code to automatically discover all analysis classes that have a common user interface so that a user can select and run whichever one they like. This should work similar to how pytest
autodetects all tests that it needs to run. In the same way, there should be a way to find all analysis classes in MDAnalysis.analysis
and then run any of these classes with user input. In this way, no-one needs to keep a list of the available analysis classes: they are just available and when a new class is added in a new release, it will just show up. This might require additions to how we write our analysis classes, e.g., by adding a new attribute that signifies that the class can be used by a CLI tool. I assume that this approach would leverage base.AnalysisBase as the common interface, with additions.
This GSoC project would then include
pytest
do this? Should collection be based on files, metaclasses and registration (like our file formats), importing and analyzing the classes, ....?)Implement a bare-bones CLI tool to demonstrate the functionality.
$ mdacli --help
Simple proof-of-concept CLI tool:
mdacli <tool> <tool-options> TOPOLOGY [TRAJECTORY ...]
Load system from TOPOLOGY and TRAJECTORY and create the master atomgroup
using selection --atom-group.
Execute MDAnalysis.analysis class <tool> with the <tool-options>. If an atomgroup is
required, use the master atom-group, denoted <AG> below.
$ mdacli --help-analysis
Available analysis:
RMSF <AG> [--verbose] [--start=None] [--stop=None] [--step=None]
RMSD <AG> [--reference=?] [--select=all] [--groupselections=None] [--weights=None] [--tol_mass=01] [--ref_frame=0]
AlignTraj <AG> REFERENCE [--select=all] [--filename=None] ....
....
See analysis.rms and align.AlignTraj (I don't know how to do all this or what it would look like in detail... for instance, how does one add the reference groups... but part of the project would be figuring out problems like these)
The difficulty level would increase to medium/challenging.
@PicoCentauri and @joaomcteixeira do you have specific opinions?
A framework for the base.AnalysisBase
to let downstream projects detect them and probably create a CLI sounds like a really good idea. This is also where I spend most of the time and effort while writing maicos
. @joaomcteixeira probably as well.
If this framework exists it becomes more attractive for users and new developers to write their analysis using the base.AnalysisBase
. Regularly, I'm facing the problem in our lab that new students/scientists start writing their analysis scripts facing the same challenges and problems i.e.
Some of these problems can be easily solved by using the base.AnalysisBase
class. However, this class could be even more powerful and helpful to write day to day analysis. An extension of the class is also in the spirit of MDAnalysis. Help people to analyze their simulation data with the least effort.
But as @orbeckst sketched the extension of the base.AnalysisBase
can be really hard. Especially when it comes to creating a useful CLI framework on the fly.
tl;dr Lets do Option 2, even if it could be challenging!
As both of you have said, Option 2 sounds a great approach to facilitate both the usage of MDAnalysis analysis frameworks and the incorporation of those by third-party projects.
Following your thoughts, I can envision implementing an automatic command-line maker that would just convert the new MDAnalysis analysis frameworks into taurenmd
command lines, provided the former are identified.
This, by any means, wouldn't hinder the development of independent command-line interfaces for specific and more complex workflows as usual.
Analysis frameworks usable directly in command lines could be tagged via a decorator pattern
that would collect all frameworks definitions. For example, isort project uses this kind of implementation.
At the level of the third-party software, the inspect
could be used to convert framework parameters into argparse
arguments automatically. To pipe multiple command-lines, we can pass the outputs (AtomGroups, arrays, etc...) via some sys
variable (pickle to disk, memory, or create a master cmd api framework that handles sub cmd apis).
@PicoCentauri and @joaomcteixeira thank you for your input, this sounds encouraging and gives me confidence to move forward with Option 2.
@PicoCentauri could you please update the Project 2 description and then send an email to the developer list, letting everyone know that the aims and scope of Proposed Project 2 have changed?
EDIT: Please also change the title of this issue and update the text so that the goal becomes clearer.
@joaomcteixeira would you be interested in co-mentoring a GSoC student if we selected one to work on this project? (~@joaomcteixeira~ @PicoCentauri @fiona-naughton and I would also be available for co-mentoring but it would clearly be very helpful if you could bring your expertise to the table)
EDIT: added @PicoCentauri instead of @joaomcteixeira – apologies
@orbeckst sorry for the late reply, I couldn't look carefully before because of the events these days.
I am pleased to participate as a co-mentoring in this project together with you and @fiona-naughton (I think you reused twice my user name in your comment by mistake and wanted to mention someone else). I am grateful for your appreciation, indeed! Just let me know about any further details.
Hi GSoC mentors @orbeckst @joaomcteixeira @fiona-naughton @PicoCentauri , here is my draft proposal for this command line interface project. Any comment is appreciated! Thanks!
I left a few comments on your draft.
On Mar 17, 2020, at 8:36 PM, Hao Tian notifications@github.com wrote:
Hi GSoC mentors @orbeckst https://github.com/orbeckst @joaomcteixeira https://github.com/joaomcteixeira @fiona-naughton https://github.com/fiona-naughton , here is my draft https://docs.google.com/document/d/1ZRCu3mT3_dLrkslWDd6t5MHJmgvfLEkyKrK69lMCVP8/edit?usp=sharing for this command line interface project. Any comment is appreciated! Thanks!
Hi @HTian1997 @orbeckst I added some comments to draft. Cheers!
@HTian1997 thanks for your proposal! I also took a quick look.
@joaomcteixeira @PicoCentauri I assigned you to this issue. Once you make mdacli public, close this issue.
Hello! I was at your seminar earlier today. I heard about mdtraj as something that was being considered and I wanted to explain a little further/support this idea. It seems that people use MDA to interconvert trajectory formats from this thread, which mdtraj
seems built to handle. I mostly am interested in performing some Markov State analysis using http://emma-project.org/latest/ and/or http://msmbuilder.org/3.8.0/. I hanve't tried using MSM
, but I do know that pyemma
uses mdtraj
as the base. Just though I'd boost interest in mdtraj (:
With the release of mdacli I will close this issue.
@germanbarcenas. A simple converter is now easy to implement. If you would like to to give us a hand get in contact with us via discord or the mailing list.
Ok I'll put this on my todo list(: I'll be in touch soon,
Is your feature request related to a problem? Please describe. It is likely that many users are just using MDAnalysis to convert trajectory formats,(comment) and learning how to use MDAnalysis just for that is overkill.
Describe the solution you'd like Mini scripts and tools would be nice.
Describe alternatives you've considered MDAnalysis could just point people to relevant tutorials / other tools like mdconvert.
Additional context Other ideas: