Case studies discussion

hezhaobin commented 4 years ago

@lindseyfaye @rsmoak I have created a new subfolder to contain all analyses for the two protein families you have worked on. I suggest you start organizing the data and analyses in this folder -- protein sequences from BLAST, for example, stays in data, while any outputs from online programs belong to output. My general rule for the use of script and analysis is that if I wrote a program that is called on the command line, be it bash, R or python, I leave it in the script folder. I use the analysis folder to put exploratory analyses such as R markdown and IPython notebooks. You can figure out your preferred way, as long as the data, scripts and results are easy to find and well documented.

hezhaobin commented 4 years ago

@lindseyfaye @rsmoak one thing that I have been curious about is how fast the C-terminal region with repeats evolve among orthologs. In both of your protein families, I think the MEME results point to fast divergence in the low complexity region, while the homology was largely established based on the N-terminal region. Can we make this observation more solid, perhaps by analyzing the different repeat units (try XSTREAM as well), and also, once we have the repeat units, can we analyze the S/T content in them? Once we have some results from the case studies, perhaps we can extend it to the entire dataset, depending on the complexity of the analysis.

hezhaobin commented 4 years ago

@lindseyfaye can you take a look at the README in the case-study/data folder and fill in the information about how you inferred the species tree? I moved your tree and reconciliation results to case-study/output/Gene-tree/mega-lfs since the tree was reconstructed in mega (to distinguish it from the RAxML analyses I am conducting). Can you add a README file in that folder and document the process by which you performed reconstruction and reconciliation?

hezhaobin commented 4 years ago

@rsmoak can you help run 02-case-studies/output/homolog-properties/XP_028889033_homologs.fasta through TANGO? If you can explain what information you extract from the output and which script you use to do that, I can then properly document the results on my side. Thanks!

rsmoak commented 4 years ago

@hezhaobin I've posted TANGO outputs at 02-case-studies/output/homolog-properties/raw-output and the r code I use for processing is 01-global-adhesin-prediction/script/R%20TANGO_summaries.Rmd

The results are in a .zip file - let me know if you need something else. I also have tested the functions myself, but haven't tried on a different computer. I'm sure you can figure out any issues, but let me know if you want anything from me on that side.

hezhaobin commented 4 years ago

awesome, thanks! I'll check it out.

On Sat, Jul 18, 2020 at 9:00 PM rsmoak notifications@github.com wrote:

@hezhaobin https://github.com/hezhaobin I've posted TANGO outputs at 02-case-studies/output/homolog-properties/raw-output and the r code I use for processing is 01-global-adhesin-prediction/script/R%20TANGO_summaries.Rmd

The results are in a .zip file - let me know if you need something else. I also have tested the functions myself, but haven't tried on a different computer. I'm sure you can figure out any issues, but let me know if you want anything from me on that side.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/binhe-lab/C037-Cand-auris-adhesin/issues/3#issuecomment-660572427, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABIFMLJRO4KASP4EYS6464DR4JHTFANCNFSM4LGOVVPA .

-- Sincerely yours Bin

binhe-lab / C037-Cand-auris-adhesin

Case studies discussion #3