Closed taylorreiter closed 7 months ago
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
View / edit / reply to this conversation on ReviewNB
elizabethmcd commented on 2024-03-20T19:31:07Z ----------------------------------------------------------------
Typo: "times" of peptides to "types" of peptides
View / edit / reply to this conversation on ReviewNB
elizabethmcd commented on 2024-03-20T19:31:08Z ----------------------------------------------------------------
Typo, "note" to "not shown"
View / edit / reply to this conversation on ReviewNB
elizabethmcd commented on 2024-03-20T19:31:08Z ----------------------------------------------------------------
I'm not sure I understand the difference between this figure and the faceted one showing 30 or less vs 31 or more, I'm not sure what that distinction is supposed to be?
taylorreiter commented on 2024-03-22T16:52:45Z ----------------------------------------------------------------
added a comment describing this, thank you!
View / edit / reply to this conversation on ReviewNB
elizabethmcd commented on 2024-03-20T19:31:09Z ----------------------------------------------------------------
This was just a little confusing and I had to reread it a couple of times to understand the take-home point of these tables. I think it's because there's a bioactivity score within a category and then that score is summed up across different categories?
View / edit / reply to this conversation on ReviewNB
elizabethmcd commented on 2024-03-20T19:31:10Z ----------------------------------------------------------------
This is interesting that those with a BLAST hit for each of the three prediction tools have a range of bioactivity scores. I still can't tell fi this sum is the different types of bioactivity categories the peptide had a hit in or some quantitative score of how bioactive something is supposed to be?
View / edit / reply to this conversation on ReviewNB
elizabethmcd commented on 2024-03-20T19:31:11Z ----------------------------------------------------------------
Is this table here a repeat of the one above?
taylorreiter commented on 2024-03-22T17:06:29Z ----------------------------------------------------------------
yes, the one above is inline markdown, this is the code that generates the table. I can remove it in one of those locations if its too confusing
taylorreiter commented on 2024-03-22T17:34:59Z ----------------------------------------------------------------
update i rephrased the whole section above and took out the table, so I'm going to leave it here!
View / edit / reply to this conversation on ReviewNB
elizabethmcd commented on 2024-03-20T19:31:12Z ----------------------------------------------------------------
Ah ok no I think this one is per petpide the different categories of bioactivity that were hit? Maybe I'm still confused by what the above score is then.
View / edit / reply to this conversation on ReviewNB
elizabethmcd commented on 2024-03-20T19:31:12Z ----------------------------------------------------------------
Might be a pain to go back and add this, but I think above you mentioned that for those that have a BLAST hit it's not quite 100%. Could you add a third category for BLAST hits that for your query peptide are almost identical to what is in the database? And the others are merely "close" hits? This may help distinguish what could be "biologically" real if those almost identical hits are still dispersed continuously like they are above
taylorreiter commented on 2024-03-22T18:00:53Z ----------------------------------------------------------------
This is a really great idea. I tried doing this and the identical hits are still dispersed continuously -- it doesn't look much different than the plot here and the code is super messy so I'm not going to make the change if that's ok.
View / edit / reply to this conversation on ReviewNB
elizabethmcd commented on 2024-03-20T19:31:13Z ----------------------------------------------------------------
Would these clustering results be more useful if you were trying to cluster these peptides compared to something else? What is the expected result if these were actually closely related or different sets of peptides vs not supposed to be peptides? Should they be clustered closer together or in different subgroups?
taylorreiter commented on 2024-03-22T18:01:51Z ----------------------------------------------------------------
Emily had this idea as well! like what if we tried to cluster the entire peptipedia database, and then overlaid our data with that. I'm going to leave these clusters as is for this notebook, but I totally think you guys are on to something and I'm going to keep thinking about the best way to do this.
View / edit / reply to this conversation on ReviewNB
elizabethmcd commented on 2024-03-20T19:31:14Z ----------------------------------------------------------------
Yeah something like this where maybe you have subsets that cluster together that might have the same bioactivity or function? Might be confirmatory bias here but the tSNE just seems to not be informative?
yes, the one above is inline markdown, this is the code that generates the table. I can remove it in one of those locations if its too confusing
View entire conversation on ReviewNB
update i rephrased the whole section above and took out the table, so I'm going to leave it here!
View entire conversation on ReviewNB
This is a really great idea. I tried doing this and the identical hits are still dispersed continuously -- it doesn't look much different than the plot here and the code is super messy so I'm not going to make the change if that's ok.
View entire conversation on ReviewNB
Emily had this idea as well! like what if we tried to cluster the entire peptipedia database, and then overlaid our data with that. I'm going to leave these clusters as is for this notebook, but I totally think you guys are on to something and I'm going to keep thinking about the best way to do this.
View entire conversation on ReviewNB
Thanks @elizabethmcd, I think I addressed all of your comments with changes to the notebook. I especially cleaned up the explanation for what I was doing with the bioactivity data.
You may also note that all of the graphs have changed -- after doing some experiments with the spider mite transcriptome, I saw that we had a ton of sORF false positives. I added a new step in peptigate to try and filter these transcripts before even doing prediction, which has led to a decrease in sORF prediction in human. It isn't perfect, but it was necessary to reduce some noise.
PR checklist
conda
environments.PR description
Stripping files out of the template
At first, I was going to orchestrate these tests with snakemake, but they're simple enough that I'm choosing to document them in READMEs in analysis-specific folders. I stripped out a lot of the extra template stuff because of that.
running peptigate on human transcriptome
the first test I did to see if peptigate works is to run it on the human transcriptome. This PR records the commands I ran to retrieve the human data, run peptigate, and has a notebook where I analyze the results.
I would love feedback from @borgesadair1 and @ecpierce about missing tests, unclear interpretations, or other things they'd like to see or thoughts they have. All of this is in the notebook.
Note I haven't updated the general documentation for this repo yet.