cran-task-views / ctv

CRAN Task View Initiative
79 stars 13 forks source link

CRAN Task View Revival: Phylogenetics #31

Closed willgearty closed 1 year ago

willgearty commented 2 years ago

@bomeara and I are interested in reviving the Phylogenetics task view. I believe there was talk of combining the "Phylogenetics" and "Genetics" task views, but now it sounds like there is going to be an "Omics" task view, so "Phylogenetics" should still be a valid discrete task view from that (and the two could link to each other).

We've put together a draft of the new updated version here: https://github.com/bomeara/PhylogeneticsTaskView/blob/main/Phylogenetics.md. It currently covers both packages that facilitate the import and handling of phylogenetic trees along with other packages that perform phylogenetic analyses (both inference and comparative analyses). We've updated the text to have better formatting than the old task view and also include many new packages that were not previously included (including those in specific fields of research).

Given that this used to be a task view, I'm not sure I need to put too much more here, but we're happy to chat more about it's revival!

tuxette commented 2 years ago

Thank you for this proposal! I don't think that I'll be able to review it before August because of the summer break but I'll definitely send the information to the people who have shown an interest in building an Omics TV to avoid overlap (it seems that there is room for complementary task views in the field). Also, if possible, we generally encourage to broaden the pool of maintainers to include more diverse feedback: did you have ideas of people (not directly from your close scientific collaborators) that you could include in addition to the two of you?

willgearty commented 2 years ago

Thanks @tuxette! I can reach out to some folks to see if they are interested in being maintainers.

rociojoo commented 2 years ago

Same here! Thank you for the proposal. Between work and vacations, I won't be able to review the proposal carefully until some time in September. It looks interesting and I agree with @tuxette that we'll need to make sure that it could work with Omics. Thanks @tuxette for checking with them.

tuxette commented 2 years ago

Hello @willgearty ! Again thanks. I've read your proposal and I have a few suggestions to improve it:

  1. While this is a task view mainly devoted to CRAN packages, I think that you should probably not completely ignore citing Bioconductor and, at least, link this view: https://bioconductor.org/packages/release/BiocViews.html#___Phylogenetics (that might not be complete; I am myself using a lot phyloseq that is not listed here, maybe because it is more oriented towards metagonomics than phylogenetics but you already have a section in your TV devoted to microbial community for instance). Please not that ggTree (that you list from its github repository) is on Bioconductor actually (and should be cited with r bioc(...
  2. Regarding your "Note" at the end of the TV, I would not include personal views in a TV (but I might be wrong: @zeileis , what do you think?).
  3. Packages only available on github have to be cited with r github(... and not with a direct link.
  4. At the beginning of your TV, you list tools to manipulate trees and I think that you should mention that the dendrogram structure could also do the job (since your citing other tools using this structure, it is worth explicitly mentioning it for newcomers).
  5. I am a bit worried about the Genetics subsection because it is very small. I am under the impression that the TV should be "Genetics and Phylogenetics" or totally avoid citing packages like adhoc which is not really related to phylogenetics unless I am mistaken?
  6. more importantly I found the overall structure of the TV very hard to follow. The reason is because you wrote very long sections of text (see "Diversification analysis" for instance) while most TV chose to have a bullet list to break a topic into small sets of similar packages. In addition, some packages are constantly repeated within the TV (ape is cited approximately 20 times I think) and while I understand that this package is very central (and rightfully cited as core), I think that it should be best to have a section dedicated to packages dealing very broadly with phylogenetics than to have it repeated so many times. @zeileis @rociojoo : what do you think?
  7. Similarly to my question about Genetics, I am wondering if this TV should cover metagenomics more broadly. This is an open question.

Finally, I am tagging @emmanuelparadis (not sure that will work) because it might be useful to have his feedback or to invite him to participate to this TV.

willgearty commented 2 years ago

Thanks, @tuxette! I'll work on all of those great suggestions as soon as possible. Regarding Genetics, I think we'd prefer to completely remove the Genetics section and leave that for the Omics task view rather than beef it up in our task view.

In the meantime, I look forward to any other comments anyone else has!

zeileis commented 2 years ago

Thanks Nathalie @tuxette and Will @willgearty! Two comments for the questions from Nathalie:

tuxette commented 2 years ago

@privefl might also want to add comments or directly contributes to this TV?

privefl commented 2 years ago

@privefl might also want to add comments or directly contributes to this TV?

I do not really have any expertise in this particular sub-field, sorry.

rociojoo commented 2 years ago

Hi all, @willgearty I have a few additional questions/remarks about the TV:

willgearty commented 2 years ago

Thanks for all of your comments! I'm now working on a new draft of the TV...

jzeyl commented 2 years ago

Hello, former grad student and postdoc in comparative physiology and anatomy here. I just wanted to say that I think this page would be a useful resource to those working in this field. It would have saved me a decent amount of googling. I like your heading breakdown and know that several of the packages do several tasks so see your challenge of avoiding repetition yet keeping heading organization. Fyi I noticed there is now a 'ggtreextra' package on bioconductor extending ggtree on circular phylogenies . All the best

willgearty commented 2 years ago

Hi all,

I've tried to address all of your comments in our new draft.

Thanks again, Will

tuxette commented 2 years ago

Hi Will @willgearty ! Thank you for the update. I'll check it before the end of the week (if I can) but regarding the number of contributors, that does not seem too many for me (as long as you've agreed on a way to collaborate).

zeileis commented 2 years ago

Thanks for the work, Will @willgearty & et al., this is very much appreciated!

I agree with Nathalie @tuxette regarding the number of co-maintainers: If you have agreed on a "modus operandi" and who does what, you should be fine. I also wouldn't be surprised if some of the currently listed maintainers eventually turn out to not be active in the future. If so, I would encourage you to streamline the list of co-maintainers and just mention others in some acknowledgments in the task view. But let's see what happens: If everybody becomes active, that would be even better!

Regarding the structure: Personally I would use one hierarchy level less. Probably I would start with: ## Overview then ## Scope and subsequently continue with ## Working with trees in R etc. Within the ## Scope you could start out with ### Core packages and then continue with something like ### Tasks or ### Package categories or something like that. But that's just my 2 cents, maybe someone else has a better idea.

Regarding the contents and level of detail I cannot contribute much. It looks reasonable to me but I have no experience in the field so I leave the feedback on this to others...

tuxette commented 2 years ago

Hi Will @willgearty ! I've checked the TV and I think that it has improved a lot. I still have a few additional comments / suggestions:

  1. This is more a question than a mandatory request but you did not account for my remark on dendrogram (August 11th, 4th point): it did not seem relevant for you?
  2. About your second point (core packages cited multiple times), my suggestion would be that, since you have a core package section (which is nice), to describe in more details the different features included in these packages and to remove them all from the rest of the text (ape is still cited 23 times so I think that it is too much).
  3. About your last questions (details on methods), I think that the current level of details is adequate and I would just add an explicit citation of the references cited at the end of your TV next to the relevant packages (whenever possible). Apart from that, I have no other comments. This is a nice TV and I'll probably be one of the frequent users.
rociojoo commented 1 year ago

Hi @willgearty This looks great! My only minor comment is that rdryad is not a package made for phylogenetics. I'm not an expert, so I wonder if dryad is the most commonly used data repository in phylogenetics.

willgearty commented 1 year ago

Thank you all!

@zeileis, I've modified the hierarchy structure as you suggested, thanks! @tuxette, I had previously included dendrogram with phylogram, but it is now it's own bullet point. @rociojoo, you are correct, I have removed rdryad.

@tuxette I've also moved a lot of the details about the core packages up into the core section. However, I'm still worried about moving all of the functions of the core packages here because a) as you've noted, that's a lot for ape which would make for a very long paragraph or list of very unrelated functions, and b) I imagine some users will go to a section to find a particular package for a function they have in mind, rather than reading everything in the core packages section first. Thoughts?

tuxette commented 1 year ago

@willgearty : sorry for my late answer... I understand your point about core packages. I don't know what's best really... @zeileis @rociojoo : any suggestion? Also, if you can wait a bit more, I've been reviewing packages for a omics TV recently and I might have a few more suggestions to make for the phylogenetics TV. I'll try to collect them before the end of the day.

zeileis commented 1 year ago

Regarding the core packages: I think that it's not necessary to list every detail of these packages in one place. I like the idea of having a dedicated section for core packages where an overview is given and the most important functionalities are discussed.

And then it's fine to list the finer details in other sections of the task view. For some packages that means that they are listed quite often but that just reflects their breadth, I guess. So that's fine with me.

tuxette commented 1 year ago

@willgearty As promised, a few other suggestions (that might be totally irrelevant: I have not looked into them in details, just seen that they could be more related to phylogenetics than to a general Omics TV so do not hesitate to not cite them if you feel they don't fit):

aphylo babette beastier beautier CALANGO coil SMITIDvisu tracerer treeducken

I might have a couple more later but that must not stop the publication of the TV when it is ready (I can suggest them after submission by creating a specific issue for that in your TV repository).

rociojoo commented 1 year ago

I agree with @zeileis about core packages. I think everything's great, and the only thing left is to address @tuxette 's package suggestions.

willgearty commented 1 year ago

Thanks for the suggestions @tuxette and @zeileis and @rociojoo for your feedback! I've added several of those packages to the CTV draft.

I think SMITIDvisu (and the related SMITIDstruct) might be more relevant for the Epidemiology TV?

tuxette commented 1 year ago

@willgearty Thanks for your work. You might be right about SMITIDvisu. I'll suggest it to the maintainers, along with other packages that we also found related to the topic.

For me, your TV seems to be ready to publish (if two other editors agree at least).

zeileis commented 1 year ago

+1

rsbivand commented 1 year ago

+1, with a question - could the great reference section items be furnished with DOI links (possibly at next revision)?

zeileis commented 1 year ago

Good idea. Note that you can use code like r doi("10.../...") for this.

willgearty commented 1 year ago

DOIs added, thanks @rsbivand and @zeileis!

zeileis commented 1 year ago

Thanks, then we can try to merge the recent changes in

https://github.com/bomeara/PhylogeneticsTaskView/blob/main/Phylogenetics.md

with the old

https://github.com/cran-task-views/Phylogenetics/blob/main/Phylogenetics.md

Dirk @eddelbuettel, can you help with this?

The cran-task-views/Phylogenetics file is connected to the pre-history from R-Forge. The bomeara/PhylogeneticsTaskView file started out from that plus some edits. If possible it would be great to enter the about 20 commits from the bomeara repository in the cran-task-views repository (including authors and time stamps etc.).

eddelbuettel commented 1 year ago

Achim @Zeileis: I do not know of tools to merge two disjoint git repositories. They may exist, I just do not know.

What I can help with (and have helped with) is to bootstrap a new git repo from an existing svn repo. This here seems different.

One thing that comes to mind is to 'export' the 'about 20 commits' as patches which could then be re-applied and committed (of course with different timestamps etc).

zeileis commented 1 year ago

OK, thanks for the explanation. Re-applying the commits with different timestamps would then still be a good solution, I think.

Could you have a stab at this?

eddelbuettel commented 1 year ago

No, sorry, I extra snowed in with work from various fronts and don't have spare cycles. It "should" just be a few minutes of shell scripting (famous last words) to extra the desired list of commit sha1s, then loop over them to extra a patch file each (as a call to git diff with proper options) and then loop to reapply them. Alas, no time to play for me.

zeileis commented 1 year ago

I've done this semi-manually now in this way:

https://github.com/cran-task-views/Phylogenetics/commits/main/Phylogenetics.md

I think that this should be good enough to track what was done by whom.

zeileis commented 1 year ago

Will @willgearty, I think that the resurrected repository should be essentially ready:

https://github.com/cran-task-views/Phylogenetics

I have abbreviated the title to just "Phylogenetics" because the "package in R" seems to be redundant in a CRAN task view.

Could you please have a look whether everything is in order?

Also, I have invited you as "Admin" for this repository. After accepting you can add/invite all co-maintainers with a "Maintain" role. (And just double checking: You want everybody in the "Maintainer:" field with full access to the repository? Or should some of them be only acknowledged as contributors in the introduction of the task view text?)

willgearty commented 1 year ago

Everything looks good to me, thanks @zeileis! I will go ahead and invite the co-maintainers when I get a chance. At the moment, I think all co-maintainers can be treated as equal, and we can shift some to contributors in the future if need-be.

zeileis commented 1 year ago

OK, fine for me!

Let me know when everybody is on board in the new repository. Then we can make the CRAN release. We should probably wait with this until the end of this week or early next week. Today we just released and announced the MixedModels task view.

willgearty commented 1 year ago

We've got a couple stragglers that still haven't accepted their github invites, but I think we're good to go otherwise, whenever you think it'd be good to release @zeileis!

zeileis commented 1 year ago

Great, I'll do the CRAN release tomorrow and announce it on Twitter. Do you happen to know who of your co-maintainers is on Twitter and what their handles are?

willgearty commented 1 year ago

@willgearty, @omearabrian, @gaballench, @hlapp, @jakeberv, @JonNations1, @ecomorph, @PalaeoSmith, and @n8_upham

zeileis commented 1 year ago

Will, thanks for this. I just wanted to prepare the release when I noticed that I hadn't run check_ctv_packages(), yet. This turns up a number of issues:

ctv::check_ctv_packages("Phylogenetics.md")
## $`Packages in info but not in packagelist`
## character(0)
## 
## $`Packages in packagelist but not in info`
## character(0)
## 
## $`Packages in packagelist but not on CRAN`
## [1] "dendrogram" "RevGadget" 
## 
## $`Packages in packagelist but archived on CRAN`
## [1] "HyPhy"     "iteRates"  "kdetrees"  "pastis"    "phyloland" "Rphylip"  
## [7] "TreePar"  

The archived packages have all been archived on CRAN earlier this year (mostly April-June) so that it is fair to assume that they probably won't be resurrected soon by the maintainers by themselves. Hence, I suggest the following:

The two unavailable packages can probably be resolved like this:

willgearty commented 1 year ago

Should be fixed now, thanks for bringing this all to my attention!

zeileis commented 1 year ago

Yay, thanks for the quick response. The task view is now online at

https://CRAN.R-project.org/view=Phylogenetics

and announced on Twitter at

https://twitter.com/AchimZeileis/status/1585231152926773248

Thank you all for your work on this!