Open JDHarlingLee opened 3 years ago
Hi Josh, I am really glad to hear your finding this tool useful for your own work and that you're using the scripts provided. Can I suggest you make a new branch for the changes you have made and submit a pull-request for these changes so I could incorporate them in the script? Always good to contribute to open-source work and make it better for others 🙂 Gal
Hi Dr Horesh,
Firstly, thanks for developing this tool, it’s a really neat way of categorisating the accessory genome, and very useful and usable.
When first trying out the script, I noticed that I was getting a very large amount of “rare, lineage-specific” genes. I found the majority of these were genes that occurred only once in my dataset (“orphan” genes from single genomes). I know that this is heavily dependent on the quality of the data, and in most instances it would be best to just remove these genes from the presence/absence file before running the classification. But for users who forget (like me!), or who are interested in such genes, perhaps it would be useful to have a flag or optional additional category of some sort to catch them?
For my own data I’ve done a rough implementation of adding an “orphan” category to the classification script, which can be used to either report or discard these genes. It’s fairly crude at the moment, but if it would be of any interest to you, let me know and I’ll happily send you a copy, or link it here.
Thanks again for developing such a useful tool, it has been very helpful already! Josh