Closed hezhaobin closed 4 years ago
I can go first, my preference would be for Thursday morning. Does 10 work for everyone?
10 Central time on Thursday 05/28/2020 works for me
My lab meeting is Thursdays at 11:15, so 10 could work, but I'll have to leave promptly. -Jan
Shall we start at 9:30 central time? If 10 works better for all of us, we can also do that. I think one hour is a good chunk of time to discuss. -- Bin
On Fri, May 22, 2020 at 4:08 PM janfassler notifications@github.com wrote:
My lab meeting is Thursdays at 11:15, so 10 could work, but I'll have to leave promptly. -Jan
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/binhe-lab/C037-Cand-auris-adhesin/issues/4#issuecomment-632909469, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABIFMLPWS7E56S4JMC23M6LRS3SV7ANCNFSM4NHAW3CA .
-- Sincerely yours Bin
@lindseyfaye can you share the slides you presented today here? @rsmoak, want to see your tips on MEME and MAST.
@lindseyfaye , can you add the alignment to the 02/data folder?
Here's the link to my slides: https://docs.google.com/presentation/d/1fRR33Fl5jp104yPFbkp1ihP3Lok_CGk2DdpLRQIME0I/edit?usp=sharing
Got it!
On Tue, Jun 2, 2020, 1:53 PM rsmoak notifications@github.com wrote:
The link to my slides: https://docs.google.com/presentation/d/1sFFlWXh1zrPW9jqEdcN1JLJZEnng9gKyJ4GyMPON9Y0/edit?usp=sharing
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/binhe-lab/C037-Cand-auris-adhesin/issues/4#issuecomment-637741577, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABIFMLP4J3PMG2SRVAO2RGLRUVDBNANCNFSM4NHAW3CA .
@janfassler you mentioned a paper about beta-aggregation in 8 C. glabrata adhesins. Can you post the link here?
A brief summary of today's discussion (feel free to amend!)
I also looked back at @lindseyfaye 's presentation from last week, and noted a few interesting points:
Lastly, summary for what I'll do next:
For what it's worth, the bacterial structure (the ligand-binding domain of the 2180 amino acid Lactobacilllus reuterii Lr70902) hit to Lindsey's protein made me think about lateral gene transfer, so I did several rounds of BLASTP this morning using the first 300 amino acids, 600 and all of Lindsey's protein against the unrestricted RefSeq database and came up with many fungal (including S. cer) but no bacterial hits. To confirm that my results weren't curtailed by a hitlist threshold, I repeated the process using a tax restriction to bacteria or to Lactobacillaceae but still came up empty. So despite the similarity in structure (Lindsey - can you confirm that this was done by threading?), there is no strong sequence relationship between bacterial LRRP beta solenoid type adhesins and this C. auris protein.
This is the reference with examples of adhesins with amyloid character (TANGO): Ramsook CB, Tan C, Garcia MC, et al. Yeast cell adhesion molecules have functional amyloid-forming sequences. Eukaryot Cell. 2010;9(3):393‐404. doi:10.1128/EC.00068-09 Ca HWP/RBT Ca EAP1 Ca EPE1 Ca ALS Sc FLO1 Sc MUC1/FLO11 Sc AGA1/FIG2 NOT: Sc SAG1
About the pangenome - it looks like you are very close with OrthoMCL output: https://kbase.us/applist/apps/PangenomeOrthomcl/build_pangenome_with_orthomcl/release
Oh, that's really interesting and useful result! @Snyder, Lindsey F lindsey-f-snyder@uiowa.edu should include in your result.
On Tue, Jun 2, 2020 at 4:00 PM janfassler notifications@github.com wrote:
For what it's worth, the bacterial structure (the ligand-binding domain of the 2180 amino acid Lactobacilllus reuterii Lr70902) hit to Lindsey's protein made me think about lateral gene transfer, so I did several rounds of BLASTP this morning using the first 300 amino acids, 600 and all of Lindsey's protein against the unrestricted RefSeq database and came up with many fungal (including S. cer) but no bacterial hits. To confirm that my results weren't curtailed by a hitlist threshold, I repeated the process using a tax restriction to bacteria or to Lactobacillaceae but still came up empty. So despite the similarity in structure (Lindsey - can you confirm that this was done by threading?), there is no strong sequence relationship between bacterial LRRP beta solenoid type adhesins and this C. auris protein.
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/binhe-lab/C037-Cand-auris-adhesin/issues/4#issuecomment-637803484, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABIFMLIGGPJLAH6SOU5UDZTRUVR5ZANCNFSM4NHAW3CA .
-- Sincerely yours Bin
Hi everyone, my department just notified me about a mandatory lab reopening meeting on Friday at 1 pm. That may interfere with our scheduled meeting. Can we move ours either just before (noon) or sometime 2 or later? Thanks!
No problem. Let's say 2:30-3:30, will that work for everyone? -- Bin
On Tue, Jun 2, 2020 at 5:03 PM rsmoak notifications@github.com wrote:
Hi everyone, my department just notified me about a mandatory lab reopening meeting on Friday at 1 pm. That may interfere with our scheduled meeting. Can we move ours either just before (noon) or sometime 2 or later? Thanks!
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/binhe-lab/C037-Cand-auris-adhesin/issues/4#issuecomment-637829880, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABIFMLL4KUQANVPFW7C4RADRUVZJPANCNFSM4NHAW3CA .
-- Sincerely yours Bin
2 pm or later is perfect for me. Qualifying exams are that morning and I may be involved in proctoring them.
Perfect, thanks!
That works for me.
I'm sure this is an incomplete list. Please reply to add your notes! Also, the slides for today can be accessed at here. All previous slides are in this folder Lastly, here is the interactive tool I built. I plan to update it so the user can restrict the output to the FungalRV>0.511 or FaaPred subsets.
@janfassler to your point of not pre-maturely ruling out enzymes, I was looking up whether ADH1 could have anything to do with cell adhesion and came upon this publication:
Klotz SA, Pendrak ML, Hein RC. 2001. Antibodies to alpha5beta1 and alpha(v)beta3 integrins react with Candida albicans alcohol dehydrogenase. Microbiology (Reading, Engl.) 147:3159–3164. PMID:11700367
"Abstract It has been hypothesized that Candida albicans possesses integrin-like receptors on its cell surface. This is because C. albicans binds numerous fluid-phase extracellular matrix (ECM) proteins on its cell surface and adheres to the same ECM proteins when immobilized. In addition, numerous antibodies to human integrins (receptors for ECM proteins) bind to the fungal cell surface and in so doing inhibit the binding of the respective proteins. To demonstrate the presence of such a cell surface integrin, a cDNA library of C. albicans yeast cells was screened with polyclonal antiserum to the human fibronectin receptor (alpha5beta1 integrin). Clones isolated by this screening technique also reacted specifically to antiserum against the human vitronectin receptor (alpha(v)beta3 integrin). DNA sequence analysis of the cloned insert predicted a 350 aa protein (37 kDa). This predicted protein showed 75% homology at the nucleotide sequence level to alcohol dehydrogenase (ADH) of Saccharomyces cerevisiae. In vitro transcription/translation of the cloned inserts yielded a 37 kDa protein that was immunoprecipitated with antibodies to the alpha5beta1 and alpha(v)beta3 integrins and an antibody to a C. albicans fibronectin receptor. These antibodies and an mAb to the human vitronectin receptor demonstrated an antigen of -37 kDa present in the cell-wall preparations of C. albicans and in spent growth medium. All four antibodies reacted with authentic ADH. The possible significance of these results in relation to C. albicans adherence is discussed."
@lindseyfaye @hezhaobin I'm working on the relational database, and was wondering if you had full results from some of the analyses you've run? Specifically, the numeric FungalRV and FaaPred results for the C. auris and S. cerevisiae queries? Or any other results that we may have filtered for before uploading resultant fasta files?
Yes, the FungalRV results are in
01-global-adhesin-prediction/output/FungalRV/all-fungalrv-results-20200529.txt
,
and similarly for FaaPred. The former is a table while the later is just a
list of protein IDs that passed the predictor's default threshold. Let me
know if you can't find them.
For relational databases, the ones I was planning to use was SQLite and the R implementation of it called RSqlite. They are much easier to set up than MySQL and other similar full-featured SQL environments. Which one are you using?
On Tue, Jun 16, 2020 at 3:52 PM rsmoak notifications@github.com wrote:
@lindseyfaye https://github.com/lindseyfaye @hezhaobin https://github.com/hezhaobin I'm working on the relational database, and was wondering if you had full results from some of the analyses you've run? Specifically, the numeric FungalRV and FaaPred results for the C. auris and S. cerevisiae queries? Or any other results that we may have filtered for before uploading resultant fasta files?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/binhe-lab/C037-Cand-auris-adhesin/issues/4#issuecomment-645005499, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABIFMLMFF35YQMXNWEHRS2TRW7LQRANCNFSM4NHAW3CA .
-- Sincerely yours Bin
@hezhaobin I don't see an actual text file; I may just not know how to access it? This is what I get
I downloaded the RSqlite package and am working up a relational database schema right now.
Yes, it's a "soft link" and the content you see points you to the actual
text file. It's like a "shortcut" in Windows. So just go down to the
local-result-HB
folder and you will see the file there.
On Tue, Jun 16, 2020 at 4:32 PM rsmoak notifications@github.com wrote:
@hezhaobin https://github.com/hezhaobin I don't see an actual text file; I may just not know how to access it? This is what I get [image: image] https://user-images.githubusercontent.com/60475658/84830286-2aea4300-aff7-11ea-9795-e821efcefe69.png
I downloaded the RSqlite package and am working up a relational database schema right now.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/binhe-lab/C037-Cand-auris-adhesin/issues/4#issuecomment-645023110, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABIFMLPE75TVOPUHDGE3JI3RW7QHBANCNFSM4NHAW3CA .
-- Sincerely yours Bin
Notes for yesterday's meeting
Notes for yesterday's meeting
- [ ] Finish OrthoMCL v6r1 analysis. [HB]
- [x] Run tango on other sequences in Jan's gene tree. [JF] [RS]
- [x] Run FungalRV locally on the new C. glabrata proteome. [HB]
- [ ] Test Albert's "Maximal" algorithm as a way to get at the repetitive motifs? [HB]
- [x] Test to see if we can programmatically extract the beta-aggregation sequence motifs based on the Tango and XStream results. [RS]
OK, I have solved the problem with the new CBS138 proteome. Turns out the sequence I downloaded back in February was CDS (DNA) sequence. I just downloaded the protein sequence and ran FungalRV locally. The number of sequences with scores above 0 is 162, matching what @rsmoak got from the webapp. Mystery solved. The new version predicted 20 more adhesins (both at 0 and 0.511 cutoffs, meaning that all 20 new predictions have score > 0.511). For details see https://github.com/binhe-lab/C037-Cand-auris-adhesin/tree/master/01-global-adhesin-prediction/script/FungalRV_adhesin_predictor
I've posted today's discussion notes under 00-misc-docs/2020-07-02-discussion-zoom.md
A quick update and plan for next meeting:
output/gene-tree
. My next step is to analyze the resulting trees for gene gain and losses and color-coding the tips by species.Cheers Bin
@hezhaobin I can at least update on Monday, even if it isn't long. Let's meet at our usual time. I'll send an invitation.
sounds good! -- Bin
On Thu, Jul 9, 2020 at 2:48 PM rsmoak notifications@github.com wrote:
@hezhaobin https://github.com/hezhaobin I can at least update on Monday, even if it isn't long. Let's meet at our usual time. I'll send an invitation.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/binhe-lab/C037-Cand-auris-adhesin/issues/4#issuecomment-656317858, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABIFMLNJWLNPKSVRGVDTKY3R2YNJNANCNFSM4NHAW3CA .
-- Sincerely yours Bin
I'm available on Monday and will try to get something together as well. I've been reading about other types of amyloid proteins and at least one paper has asked evolutionary questions concerning the relationship between the repeats and the unique sequences; more or less what we are interested in, so I can talk about that. One problem I've wasted too much time on is that no matter how I run Tango at the command line (batch file, single sequence file, or simple input) I don't get the desired output. It's something else entirely, much less useful. I'm looking for workarounds. -Jan
Jan, can you send me a multi fasta of your sequences? I assume you want an output like the other TANGO file outputs. I'll try and troubleshoot TANGO.
Lindsey has a zoom meeting next Monday from 3-3:45. @janfassler @rsmoak let's start at 3:45. I'll send an invite soon.
I've been working on PCA - using the amino acid data in the Higgs and Attwood book as a test case. I made a markdown file that might work for other situations. @rsmoak can you direct me to the summary file of adhesin data that you have been compiling? I'd like to see what adaptations may be needed for this more complex dataset. Thanks.
@janfassler I haven't updated this with all of the new species I've been working on, but the current summary table can be found at 01-global-adhesin-prediction/output/combined_results.txt
. The TANGO and XSTREAM results are very simplified in the summary table to agg_seqs (number of aggregation sequences past the TANGO thresholds), num_tr (number of tandem repeats in the protein), respectively.
Thanks! Probably, the simpler the better, for a first pass!
@rsmoak @janfassler @lindseyfaye I summarized my presentation yesterday: https://github.com/binhe-lab/C037-Cand-auris-adhesin/blob/master/00-misc-docs/2020-07-20-discussion-zoom.md
One interesting new finding: when I summarized the results of GPI-anchor prediction, I found that while the vast majority of the 110 sequences were predicted to have an GPI-anchor, supporting their potential role as an adhesin (anchored on the cell wall), only 5/17 homologs in the mysterious S. stipitis were predicted to have a GPI-anchor. I feel the prediction for GPI-anchor is likely to have high sensitivity and specificity, given its relatively simple rule (N-terminal signal peptide and C-terminal GPI signal peptide), the large number of homologs in S. stipitis may actually be involved in some other processes!
@janfassler @rsmoak @lindseyfaye I created a shared presentation file for us to add content -- for next Monday's meeting. Jan, Rachel, if you have most of your slides in powerpoint format, feel free to continue working with that. When you are done, just upload it here: https://drive.google.com/drive/folders/1EdSbLmY5Dzml7BjGU6ROISzPMVDeRVI9?usp=sharing
Also, a quick update on @janfassler 's question regarding whether the N-terminal domain (350 aa) could have homologs in phage/bacteria -- I did a HMMER search with the first 350 aa and restricted the taxonomy to viruses, archaea and eubacteria. The e-value cutoff is 0.01 and no matches were found. I then repeated the search with blastp against the non-redundant protein database restricted to viruses and bacteria. This time I got three significant hits! And both the percent identity and query coverage are respectable. However, I'm now puzzled as to why only P. syringe has this domain? And only in some strains? See below for details (scroll to the bottom): https://github.com/binhe-lab/C037-Cand-auris-adhesin/tree/master/02-case-studies/output/blast
@janfassler I finally checked the e-value cutoff question you raised regarding the identification of XP_028889033 homologs. My conclusion, based on two analyses, is that they are genuine homologs. See the last section of the analysis shown below: https://rpubs.com/emptyhb/649295
@hezhaobin I see what you've done and I agree that the proteins you pulled out likely do have valid N-terminal hyphally regulated domains which is what we agreed to look for and to use in our phylogenetic analyses. As an aside, I do think it's important to remind ourselves that the proteins we identify this way (via BLAST) are homologs of that domain, and not necessarily homologs of the (full-length) query protein. Likewise, the phylogeny is of the domain and not necessarily of any particular adhesin.
I fully agree. Will be careful in describing this result and discussing the evolutionary dynamics of this group of genes sharing this N-terminal domain. More analysis of the clustering of the C-terminus, by looking at the types and distribution of short motifs such as beta-aggregation sequences and MEME identified motifs could shed more light onto the evolutionary history of this group of genes.
On Sat, Aug 15, 2020 at 2:23 PM janfassler notifications@github.com wrote:
@hezhaobin https://github.com/hezhaobin I see what you've done and I agree that the proteins you pulled out likely do have valid N-terminal hyphally regulated domains which is what we agreed to look for and to use in our phylogenetic analyses. As an aside, I do think it's important to remind ourselves that the proteins we identify this way (via BLAST) are homologs of that domain, and not necessarily homologs of the (full-length) query protein. Likewise, the phylogeny is of the domain and not necessarily of any particular adhesin.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/binhe-lab/C037-Cand-auris-adhesin/issues/4#issuecomment-674438171, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABIFMLI43Z6GUYLTAKGEOILSA3OEVANCNFSM4NHAW3CA .
-- Sincerely yours Bin
discussion notes (my recollection) and todo list here @lindseyfaye https://github.com/binhe-lab/C037-Cand-auris-adhesin/blob/master/00-misc-docs/2020-08-17-discussion.zoom.md
@janfassler can you send me the paper documenting the Ser vs Thr difference, and the mutagenesis analysis of the beta aggregation sequence, if it's not the one Lindsey showed at the end today?
Just to clarify @hezhaobin hypotheses 1 and 2 in your summary are alternatives, yes? They aren't actually meant to be considered together.
This is the paper I had in mind with experimental mutation of a beta-aggregation prone sequence:
Rousseau et al., 2006. Protein aggregation and amyloidosis; confusion of kinds? Curr. Op. in Structural Biology.
.
Seems like a good guess that amino acid 2-7 of the 7 amino acid sequence GVVIVTT corresponds to positions 1 through 6 in the figure.
Another paper that might be helpful in this regard is Ramsook et al., 2010. Yeast cell adhesion molecules have functional amyloid-forming sequences. Eukaryotic Cell.
Table 1 from this paper spells out the beta aggregation sequences in various adhesins in C. albicans and in S. cerevisiae, all very hydrophobic (like our GVVIVTT). The authors comment: Ile, Thr, and Val residues have aliphatic β-branched side chains that greatly restrict backbone conformation and have high β-strand potential (6). These residues are very hydrophobic, bulky, and have side-chain interactions that stabilize the β-sheets in amyloids. These properties are what we might expect in sequences whose primary purpose is to form amyloids. In contrast, the adhesin sequences had very few aromatic residues, which are the major category of β-aggregation- and amyloid-prone sequences in other proteins. Thus, the β-aggregation-prone sequences in the adhesins are also biased against aromatic residues. We suggest that the unusual composition of the adhesin amyloid sequences leads to the unusually facile amyloid formation that these peptides and proteins display.
Finally, I'm attaching a few slides with my observations about the differences in distribution of serine threonine bias in C. albicans ALS proteins versus Rachel's C. auris adhesin. Serine-Threonine.pptx
Just to clarify @hezhaobin hypotheses 1 and 2 in your summary are alternatives, yes? They aren't actually meant to be considered together.
Yes, they are alternative, mutually-exclusive hypotheses.
Got the papers. Will look into them.
@janfassler Jan, you once mentioned a paper that talked about the Serine-rich domain, and that was the motivation for looking at Serine and Threonine content separately. Can you remind me of that paper and the idea behind?
Hi Bin,
@hezhaobin It was the ALS5 paper with the figure below that caused me to map the serines and threonines individually in Rachel’s protein.
Otoo HN, Lee KG, Qiu W, Lipke PN. Candida albicans Als adhesins have conserved amyloid-forming sequences. Eukaryot Cell. 2008;7(5):776-782. doi:10.1128/EC.00309-07
Below find the following 3 images which can also be seen in the Powerpoint (Serine_Threonine.pptx) that I posted above a few weeks ago.
Image 1: Domain cartoon of Als5 from Otoo paper Image 2: Although it’s true that the percent serine and threonine are both high in Rachel's protein, the distribution differs with the C terminus being enriched for threonine and serines scattered throughout Image 3: Domain cartoon of Rachel's protein
Thanks Jan! Should have read your previous reply - have to say that I still haven't absorbed the information with respect to beta-aggregation prone vs amyloid forming sequences, and the role of serine, threonine and glycosylation in these contexts. Have you read anything that would suggest a reason for the serine-rich domain immediately after the N-terminal Hyphal_reg_CWP, followed by a moderately Threonine-rich stalk?
Hi Bin,
@hezhaobin It was the ALS5 paper with the figure below that caused me to map the serines and threonines individually in Rachel’s protein.
Otoo HN, Lee KG, Qiu W, Lipke PN. Candida albicans Als adhesins have conserved amyloid-forming sequences. Eukaryot Cell. 2008;7(5):776-782. doi:10.1128/EC.00309-07
Below find the following 3 images which can also be seen in the Powerpoint (Serine_Threonine.pptx) that I posted above a few weeks ago.
Image 1: Domain cartoon of Als5 from Otoo paper Image 2: Although it’s true that the percent serine and threonine are both high in Rachel's protein, the distribution differs with the C terminus being enriched for threonine and serines scattered throughout Image 3: Domain cartoon of Rachel's protein
@lindseyfaye @rsmoak, let's put together the results we have so far into a PPT format. Perhaps one slide per result + any key method information. I'm currently working on the OrthoMCL results, and plan on including the CATH and Pfam in my global analyses. I will need 5-7 days to get these done. Will either of you be able to put together a 20-30 min presentation for next week? I can do Monday any time, Tuesday afternoon, Wednesday morning or Thursday morning. @janfassler, do you have time constraints?