Closed roblanf closed 7 years ago
I vote for option 1 as well.
The python / cython code for tigger is available in a separate repository here: https://github.com/brettc/tigger in any case. This would be a good starting place for future Tiger/python projects.
A comment though: I think the reason to remove it is that the entropy stuff works better. I don't think a good reason to remove it is because: "users find compiling C hard". We already distribute compiled C code (raxml and phyml). If we didn't have the entropy solution, we'd simply have to distribute some more compiled c code (which, of course, would make our lives more difficult).
I should add that compiling tigger is, in fact, very easy. You just need to right dependencies installed. The hard bit is making sure you have all the right dependencies installed (ie. the reason Anaconda is so good).
Cheers, Brett
On 19 May 2016, 9:55 AM +1200, roblanfnotifications@github.com, wrote:
Hey@brettc(https://github.com/brettc)@pbfrandsen(https://github.com/pbfrandsen)@wrightaprilm(https://github.com/wrightaprilm),
Here's something to figure out. Right now we have THREE implementations of TIGER, and this is confusing. They are:
Paul's TIGER code in C Brett's TIGGER code in C Paul's pure python TIGER code
Right now these can all be called from the commandline with options like--kmeans tiger`or--kmeans fast_tiger``. It's kinda confusing. My question is, what is the best thing for users here? We don't officially support TIGER for DNA or protein any more (entropy is just as good, and ~infinitely quicker), but TIGER might still be useful for morphology datasets, which are much smaller.
Here are two options, let me know which you prefer, or if there are others you think are better.
Option 1
Delete all C code, retain Paul's pure python implementation, and only have TIGER work with morphology (these datasets are small, and pure python is OK here). If users call --kmeans tiger from the commandline with DNA or protein, they get back an error.
Pros: v. simple for users, and for us to maintain. Cons: users don't get to do TIGER rates on DNA/Protein any more (do they care?)
Option 2
Retain one of the C implementations (in which case Brett/Paul need to be willing to maintain it if there are bugs, which of course there aren't, and also answer inevitable installation/compilation questions on the forum), and use that for DNA/Protein, but pure python for morphology (so that we avoid problems in the most common use case).
My vote is forOption 1. Because it's the simplest, and doesn't require users to ever compile C code, which is I think a step too far for most users, and will just put people off. We can keep the code in place for running fastTIGER and TIGGER, but just remove the options to do so, and mentions of them in the docs.
Thoughts?
R
— You are receiving this because you were mentioned. Reply to this email directly orview it on GitHub(https://github.com/brettc/partitionfinder/issues/103)
Fair enough.
To continue the argument, I would say that the compiled C code we distribute right now is code that is regularly compiled by 1000's of people on lots of systems, and has dedicated support forums not run by us. My concern is that our C code has not (yet) been widely used, and we'd be doing all the support. Which is something I'd bet none of us are super excited about.
R
On 19 May 2016 at 08:35, Brett Calcott notifications@github.com wrote:
I vote for option 1 as well.
The python / cython code for tigger is available in a separate repository here: https://github.com/brettc/tigger in any case. This would be a good starting place for future Tiger/python projects.
A comment though: I think the reason to remove it is that the entropy stuff works better. I don't think a good reason to remove it is because: "users find compiling C hard". We already distribute compiled C code (raxml and phyml). If we didn't have the entropy solution, we'd simply have to distribute some more compiled c code (which, of course, would make our lives more difficult).
I should add that compiling tigger is, in fact, very easy. You just need to right dependencies installed. The hard bit is making sure you have all the right dependencies installed (ie. the reason Anaconda is so good).
Cheers, Brett
On 19 May 2016, 9:55 AM +1200, roblanfnotifications@github.com, wrote:
Hey@brettc( https://github.com/brettc)@pbfrandsen(https://github.com/pbfrandsen)@wrightaprilm(https://github.com/wrightaprilm ),
Here's something to figure out. Right now we have THREE implementations of TIGER, and this is confusing. They are:
Paul's TIGER code in C Brett's TIGGER code in C Paul's pure python TIGER code
Right now these can all be called from the commandline with options like--kmeans tiger`or--kmeans fast_tiger``. It's kinda confusing. My question is, what is the best thing for users here? We don't officially support TIGER for DNA or protein any more (entropy is just as good, and ~infinitely quicker), but TIGER might still be useful for morphology datasets, which are much smaller.
Here are two options, let me know which you prefer, or if there are others you think are better.
Option 1
Delete all C code, retain Paul's pure python implementation, and only have TIGER work with morphology (these datasets are small, and pure python is OK here). If users call --kmeans tiger from the commandline with DNA or protein, they get back an error.
Pros: v. simple for users, and for us to maintain. Cons: users don't get to do TIGER rates on DNA/Protein any more (do they care?)
Option 2
Retain one of the C implementations (in which case Brett/Paul need to be willing to maintain it if there are bugs, which of course there aren't, and also answer inevitable installation/compilation questions on the forum), and use that for DNA/Protein, but pure python for morphology (so that we avoid problems in the most common use case).
My vote is forOption 1. Because it's the simplest, and doesn't require users to ever compile C code, which is I think a step too far for most users, and will just put people off. We can keep the code in place for running fastTIGER and TIGGER, but just remove the options to do so, and mentions of them in the docs.
Thoughts?
R
— You are receiving this because you were mentioned. Reply to this email directly orview it on GitHub( https://github.com/brettc/partitionfinder/issues/103)
— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/brettc/partitionfinder/issues/103#issuecomment-220178658
Rob Lanfear School of Biological Sciences, Macquarie University, Sydney
phone: +61 (0)2 9850 8204
www.robertlanfear.com
Yep!
On 19/05/2016, at 11:22 AM, roblanf notifications@github.com wrote:
Fair enough.
To continue the argument, I would say that the compiled C code we distribute right now is code that is regularly compiled by 1000's of people on lots of systems, and has dedicated support forums not run by us. My concern is that our C code has not (yet) been widely used, and we'd be doing all the support. Which is something I'd bet none of us are super excited about.
R
On 19 May 2016 at 08:35, Brett Calcott notifications@github.com wrote:
I vote for option 1 as well.
The python / cython code for tigger is available in a separate repository here: https://github.com/brettc/tigger in any case. This would be a good starting place for future Tiger/python projects.
A comment though: I think the reason to remove it is that the entropy stuff works better. I don't think a good reason to remove it is because: "users find compiling C hard". We already distribute compiled C code (raxml and phyml). If we didn't have the entropy solution, we'd simply have to distribute some more compiled c code (which, of course, would make our lives more difficult).
I should add that compiling tigger is, in fact, very easy. You just need to right dependencies installed. The hard bit is making sure you have all the right dependencies installed (ie. the reason Anaconda is so good).
Cheers, Brett
On 19 May 2016, 9:55 AM +1200, roblanfnotifications@github.com, wrote:
Hey@brettc( https://github.com/brettc)@pbfrandsen(https://github.com/pbfrandsen)@wrightaprilm(https://github.com/wrightaprilm ),
Here's something to figure out. Right now we have THREE implementations of TIGER, and this is confusing. They are:
Paul's TIGER code in C Brett's TIGGER code in C Paul's pure python TIGER code
Right now these can all be called from the commandline with options like--kmeans tiger`or--kmeans fast_tiger``. It's kinda confusing. My question is, what is the best thing for users here? We don't officially support TIGER for DNA or protein any more (entropy is just as good, and ~infinitely quicker), but TIGER might still be useful for morphology datasets, which are much smaller.
Here are two options, let me know which you prefer, or if there are others you think are better.
Option 1
Delete all C code, retain Paul's pure python implementation, and only have TIGER work with morphology (these datasets are small, and pure python is OK here). If users call --kmeans tiger from the commandline with DNA or protein, they get back an error.
Pros: v. simple for users, and for us to maintain. Cons: users don't get to do TIGER rates on DNA/Protein any more (do they care?)
Option 2
Retain one of the C implementations (in which case Brett/Paul need to be willing to maintain it if there are bugs, which of course there aren't, and also answer inevitable installation/compilation questions on the forum), and use that for DNA/Protein, but pure python for morphology (so that we avoid problems in the most common use case).
My vote is forOption 1. Because it's the simplest, and doesn't require users to ever compile C code, which is I think a step too far for most users, and will just put people off. We can keep the code in place for running fastTIGER and TIGGER, but just remove the options to do so, and mentions of them in the docs.
Thoughts?
R
— You are receiving this because you were mentioned. Reply to this email directly orview it on GitHub( https://github.com/brettc/partitionfinder/issues/103)
— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/brettc/partitionfinder/issues/103#issuecomment-220178658
Rob Lanfear School of Biological Sciences, Macquarie University, Sydney
phone: +61 (0)2 9850 8204
www.robertlanfear.com — You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/brettc/partitionfinder/issues/103#issuecomment-220186877
Option 1. Whatever streamlines maintenance and support.
I think we definitely need to simplify things with the commands (maybe just --tiger when using morphology?). I would vote to leave the code in for people that really want to use it, but not offer any support.
Paul
On Thu, May 19, 2016 at 11:46 AM April Wright notifications@github.com wrote:
Option 1. Whatever streamlines maintenance and support.
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/brettc/partitionfinder/issues/103#issuecomment-220366457
I think Paul made a good suggestion. Some people might want to use the code, including Paul.
Best Christoph
Am 20.05.2016 um 14:53 schrieb Paul Frandsen notifications@github.com:
I think we definitely need to simplify things with the commands (maybe just --tiger when using morphology?). I would vote to leave the code in for people that really want to use it, but not offer any support.
Paul
On Thu, May 19, 2016 at 11:46 AM April Wright notifications@github.com wrote:
Option 1. Whatever streamlines maintenance and support.
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/brettc/partitionfinder/issues/103#issuecomment-220366457
— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub
Dr. Christoph Mayer Email: c.mayer.zfmk@uni-bonn.de Tel.: +49 (0)228 9122 403
Zoologisches Forschungsmuseum Alexander Koenig
Stiftung des öffentlichen Rechts; Direktor: Prof. J. W. Wägele Sitz: Bonn
Hey @pbfrandsen do you have your pure python TIGER code running and tested yet? If so, can you pull in changes to master, then submit a pull request to master?
Once we have it in there and working, I'll clean up the C code issues as follows:
Hi Rob,
It's all implemented and running fine. Just need to write a couple of tests. I'm in Panama and taking off on a boat tomorrow, but I'll do my best to write a few tests this week.
Paul
sent from my mobile phone, apologies for any typos On May 31, 2016 8:46 PM, "roblanf" notifications@github.com wrote:
Hey @pbfrandsen https://github.com/pbfrandsen do you have your pure python TIGER code running and tested yet? If so, can you pull in changes to master, then submit a pull request to master?
Once we have it in there and working, I'll clean up the C code issues as follows:
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/brettc/partitionfinder/issues/103#issuecomment-222870692, or mute the thread https://github.com/notifications/unsubscribe/ABvScfBY5ZmrzhDqggzeGk2DLZG3tm6Nks5qHOQZgaJpZM4IhvHP .
Done.
Entropy is now the only option for DNA and protein - TIGER is just too slow, especially since lots of people have datasets with of the order of ~1m sites these days. Also Entropy seems to just work better.
Morphology can use entropy (default) or TIGER ('--kmeans tiger' at the command line).
Hey @brettc @pbfrandsen @wrightaprilm,
Here's something to figure out. Right now we have THREE implementations of TIGER, and this is confusing. They are:
Right now these can all be called from the commandline with options like
--kmeans tiger
or--kmeans fast_tiger
. It's kinda confusing. My question is, what is the best thing for users here? We don't officially support TIGER for DNA or protein any more (entropy is just as good, and ~infinitely quicker), but TIGER might still be useful for morphology datasets, which are much smaller.Here are two options, let me know which you prefer, or if there are others you think are better.
Option 1
Delete all C code, retain Paul's pure python implementation, and only have TIGER work with morphology (these datasets are small, and pure python is OK here). If users call --kmeans tiger from the commandline with DNA or protein, they get back an error.
Pros: v. simple for users, and for us to maintain. Cons: users don't get to do TIGER rates on DNA/Protein any more (do they care?)
Option 2
Retain one of the C implementations (in which case Brett/Paul need to be willing to maintain it if there are bugs, which of course there aren't, and also answer inevitable installation/compilation questions on the forum), and use that for DNA/Protein, but pure python for morphology (so that we avoid problems in the most common use case).
My vote is for Option 1. Because it's the simplest, and doesn't require users to ever compile C code, which is I think a step too far for most users, and will just put people off. We can keep the code in place for running fastTIGER and TIGGER, but just remove the options to do so, and mentions of them in the docs.
Thoughts?
R