Closed reneeotten closed 6 years ago
Hello,
Those are all good comments! The first point, I believe it is kept to one decimal to make parameter adjustment easier. For instance, if you wanted to set the intensities of residues for only B1 = 25.0 you can just make the heading: [I0, 25.0] or something like that. More decimals makes it a little more cumbersome to do the parameter matching and doesn't significantly affect the fit, since inhomogeneity is taken into account anyways. I could be wrong on this.
The second point. I like it!
Third point. I think everyone would like something like this, but it's perhaps tedious to get to work nicely in a general way (formatting, colors, how to treat different Trlx, what if there's missing data for one field and not another, etc). I'm sure it can be done, but I think it could potentially be a bit of a headache.
Last point. I'd never thought about this issue. Have you run a test and seen whether the same seed is used? My feeling is that different processors would produce different random numbers, but I really have no idea. It wouldn't be difficult to, say, increment a seed like we do for the 'chemex-dump' directories depending on what already exists. But that could run into its own conflicts on a large scale too, I'm sure.
Those are my thoughts, anyways.
Alex
hi Alex,
thanks for your quick answer - I response to that:
1st: using only one decimal for the field strength is totally fine and I agree that this will not affect the fitting. What I am suggesting is only to change how the B1 field is written to the output (_.dat) file. It's done in the data_point.py file(s) as follows: output.append('{b1frq:6.1e}'.format(*self.par)) So if I have for example 26.4 Hz it will be written as 2.6e+01 -- so you actually lose the decimal value.
3rd: I think for the CEST data it shouldn't be too complicated unless you're going to simultaneously fit data at different temperatures, but I do see your point! I am totally happy to make my own "final" plots afterwards.
4th: use of the "random" module in parallel processes is widely discussed on the internet, I think there are specific libraries for this if you really want to have it properly implemented.... As far as I know you cannot get the seed value back - so it's difficult to test. My understanding of reading the documentation is that you'll use the clock to get a seed if you don't specify it yourself. Perhaps switching to SystemRandom might already help a bit - this would use "sources provided by the operating system", probably including the time. But to me it seems that a acceptable "poor-mans-version" of it (i.e., without going to special libraries) would be to allow for a command-line variable to input the seed and set it just before the "for loop" in line 100 of main.py That would make sure that the independent jobs will start from a different position and for the individual number of MC runs I think it's fair to assume it will be random (enough).
Hi Renee and Alex!
Renee, thanks for all the good suggestions!
hi Guillaume and Alex,
I am interested in constraining the fitting parameters (with "lmfit") and see that work towards this is being done in the branch "feature/from_point_to_profile". I started poking around there a little to get a better understanding of the code. It seems to work pretty well and gives consistent results with earlier ChemEx version, at least for the few simple examples I tested, but I guess it is still work in progress or do you consider this branch "tested & approved"?
When I started following the code from main.py , I noticed some issues in parsing.py in particular when using the positional argument "info'. As a novice with git, I fiddled around a bit with it, and created a branch trying to fix this (not very important) issue - see https://github.com/reneeotten/chemex/tree/ro_fix_info [currently only for cest.iph.2st]
Anyway, I just wanted to let you know I am interested in contributing to the project and would be happy to discuss and/or fix stuff while going through and trying to understand the code-base and testing/using it. In addition, I could implement some of the suggestions in this thread as well. But before doing so, I would appreciate your input on the current status of this branch (e.g., do you have an update local branch) and if there are specific areas where you could use help.
Have a nice weekend! Renee
Hello,
Sorry for the slow response. The 'from_point_to_profile' is work that Guillaume is mostly doing to improve performance and, as you say, allow constraints and relations of parameters. From talking with him, it seems it's coming along well, but I don't know how "approved" it is, at the moment.
As far as contributing goes, I think we'd all be happy to have more eyes on the project! As you can imagine, we all get pretty busy with our "day jobs" and can't contribute as much as we'd hope. I'll let Guillaume chime in about any systems or conventions to the code. And I forget how becoming a contributor works, whether you need to be invited or simply request access. If the latter, I think the answer is "go right ahead!"
hi Guillaume (and others),
thanks for pushing updates to new branch, it looks very nice! I started going through the code a little bit and, as I said earlier, am interested in helping out to get this to a production-ready, stable version. How would you consider this version (i.e., what parts are tested and work properly, what needs work)?
I am not completely sure how you want to go about discussing changes to the code and collaborating on it, but I guess pull requests are intended for this. I'll go ahead and open two pull requests soon-ish for some rather trivial things; we can see how that goes and take it from there.
Renee
Thanks Renee for your message! I'm glad you like the new version of the program and really appreciate your help making it even better. I also think that pull request is the way to go for discussing and submitting changes to the code. I still need to familiarise myself with it though.
The development of the new version first aimed at improving the performance of the code, but later focused on simplifying it as well as adding new features (like constraints and relations of parameters). I believe enough has now been achieved, both in terms of code optimization and new features, to justify a release. However, as you pointed out, a fair bit of polishing is first required.
1. "chemex info": This is currently broken (as mentioned in one of your Pull Request) and I am still debating how this should be fixed. As exchange model and experiment type are now dissociated, it is not clear to me how a set of fit parameters for a specific experiment can be provided... It is probably better to use the docstring of each experiment file (like 'experiments/cest/profiles/ip.py') like a 'readme' file including a full description of the experiment and mention of what are the experimental parameters the user should provide for the fit (something like a template of the 'experiment.cfg' file, where each parameter is described). There's probably no need to implement any 'fancy' code to automatically grab these parameters from the code itself as it was the case in previous versions.
2. "shifts" experiments: Even these do work, they don't follow the new 'coding style' where 'profiles' and exchange model are dissociated. Needs a general way to compute the shift regardless of the employed model.
3. Installation: The new version is developed for Python 3 and imports 'lmfit'. None of them is widely used. Therefore it's probably a good idea to provide easy-to-follow steps to install the program. I am more and more convinced that (ana)conda is the way to go for this purpose as environments with specific versions of python and libraries can easily be set up. I've already added the 'environment.yml' file which allows one to create an environment with all requirements for chemex. Would also be nice to add chemex to the anaconda server, so that it can be installed by just typing 'conda install chemex'. On the same note, 'chemex' should be added to the PyPI (Python Package Index) server. All of that should be done once a stable version is available.
4. Code review and bug search: This one is self explanatory. The more eyes on the code and the more people using this version, the better. Feedbacks of any kind, from bug reports to suggestions on how to improve the code, are more than welcome.
5. Long term goals (not necessary for the next release...):
I probably forget a lot of things that should be done. Please do not hesitate to add anything to this list you think relevant.
hi Guillaume, I certainly agree with all the points you mention! It will take me a bit of time to go through the code and figure stuff out, but once I am done with that (and hopefully already do something on the way) we can see what parts have the highest priority and how to tackle them.
Thanks for merging the PRs! I think it would be good to commit small(ish) pieces code and write clear commit messages and then push often to the repository. Before pushing the code you can always do a "rebase" to combine commits/messages to keep a reasonable clean history. Of course, once it's pushed you shouldn't rewrite history anymore :)
For example, now there are 3 messages relating to the merge of a single PR, that seems a bit too much and doesn't improve readability. It seems you changed a few things in my PR and then merged it, correct? I am sure there is some best-practice on how to do this, but again it's all part of the learning curve!
Hi Renee, small commits with clear description is definitely the way to go. I can't agree more with you on this point. I also realize there is much room for improvement on my side ;)
Merging PR19 was kind of messy. Part of if came from the fact that I first merged PR20, which introduced a conflict with PR19 due to the fact that 'chemex/chemex.py' got modified in the process. When that was fixed, my IDE (pycharm) decided to automatically reformat all the new code, which did not play well with the merging... As you said, it's all part of the learning curve!
regarding your previous comment about "chemex info". In the short term it's probably easier to just add all the information into the docstring each experiment.
However, I think it would be good to do some additional coding to extract that information from the code (it seems to me that it's all present in the function "create_defaultparams" in chemex/bases//_.py). The reason is that I think it would be helpful to introduce a positional argument "init" (or similar) that creates a default directory structure and input files. In that case we would need the same information to write for example parameter and experiment files.
Lots of these things are still relevant today, but I'd rather keep the issues more focused, dealing on a single point.
hi Guillaume and co.,
I have been using ChemEx for a while now (from the "develop" branch) and it's very nice, thanks a lot for all the work! I have a few comments/suggestions - I can certainly do some of it myself, but first wanted to check what you think and if I'm going to add/change code in which branch I should do so...
Thanks! Renee