Open hezhaobin opened 4 years ago
@lindseyfaye @rsmoak one thing that I have been curious about is how fast the C-terminal region with repeats evolve among orthologs. In both of your protein families, I think the MEME results point to fast divergence in the low complexity region, while the homology was largely established based on the N-terminal region. Can we make this observation more solid, perhaps by analyzing the different repeat units (try XSTREAM as well), and also, once we have the repeat units, can we analyze the S/T content in them? Once we have some results from the case studies, perhaps we can extend it to the entire dataset, depending on the complexity of the analysis.
@lindseyfaye can you take a look at the README in the case-study/data
folder and fill in the information about how you inferred the species tree? I moved your tree and reconciliation results to case-study/output/Gene-tree/mega-lfs
since the tree was reconstructed in mega (to distinguish it from the RAxML analyses I am conducting). Can you add a README file in that folder and document the process by which you performed reconstruction and reconciliation?
@rsmoak can you help run 02-case-studies/output/homolog-properties/XP_028889033_homologs.fasta
through TANGO? If you can explain what information you extract from the output and which script you use to do that, I can then properly document the results on my side. Thanks!
@hezhaobin I've posted TANGO outputs at 02-case-studies/output/homolog-properties/raw-output
and the r code I use for processing is 01-global-adhesin-prediction/script/R%20TANGO_summaries.Rmd
The results are in a .zip file - let me know if you need something else. I also have tested the functions myself, but haven't tried on a different computer. I'm sure you can figure out any issues, but let me know if you want anything from me on that side.
awesome, thanks! I'll check it out.
On Sat, Jul 18, 2020 at 9:00 PM rsmoak notifications@github.com wrote:
@hezhaobin https://github.com/hezhaobin I've posted TANGO outputs at 02-case-studies/output/homolog-properties/raw-output and the r code I use for processing is 01-global-adhesin-prediction/script/R%20TANGO_summaries.Rmd
The results are in a .zip file - let me know if you need something else. I also have tested the functions myself, but haven't tried on a different computer. I'm sure you can figure out any issues, but let me know if you want anything from me on that side.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/binhe-lab/C037-Cand-auris-adhesin/issues/3#issuecomment-660572427, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABIFMLJRO4KASP4EYS6464DR4JHTFANCNFSM4LGOVVPA .
-- Sincerely yours Bin
@lindseyfaye @rsmoak I have created a new subfolder to contain all analyses for the two protein families you have worked on. I suggest you start organizing the data and analyses in this folder -- protein sequences from BLAST, for example, stays in
data
, while any outputs from online programs belong tooutput
. My general rule for the use ofscript
andanalysis
is that if I wrote a program that is called on the command line, be it bash, R or python, I leave it in the script folder. I use theanalysis
folder to put exploratory analyses such as R markdown and IPython notebooks. You can figure out your preferred way, as long as the data, scripts and results are easy to find and well documented.