RobertsLab / resources

https://robertslab.github.io/resources/
19 stars 11 forks source link

Locating oyster seed proteomics data files needed for manuscript #517

Closed shellywanamaker closed 5 years ago

shellywanamaker commented 5 years ago

We are trying to locate all the proteomics data files we'll need for the C.gigas seed temperature x time series manuscript. As a reminder, this data is from Rhonda's 2016 oyster seed experiment.

@kaitlynrm what is the original data file you started with? Then maybe @sr320 can trace back from there?

@emmats could you advise on which proteomics data files should be uploaded to which repositories? And if you might know where those data files might live?

kaitlynrm commented 5 years ago

I started with the proteins and their abundance values , which I believe is an output from Abacus after proteins were assembled via the TransProteomic Pipeline. @sr320 was able to annotate the table I was given with SwissProt/UniProt on MySql which i think means there is an associated .fasta file? Do you have that date from MySql?

emmats commented 5 years ago

Pretty sure Rhonda kept all her files on the local Roberts server. If you can't find them, let me know.

As with all DDA files, they should be uploaded to PRIDE: raw files, pep.xml from Comet search, and fasta search database.

sr320 commented 5 years ago

This should be the raw data: http://owl.fish.washington.edu/phainopepla/C_gigas/2016-12-05/

This is also a good test to see how well https://github.com/RobertsLab/resources/wiki/Data-Management#proteomics-data-management-plan works / is folllowed.

sr320 commented 5 years ago

Is this what you need?

shellywanamaker commented 5 years ago

@sr320 yes, but also looking for where @kaitlynrm's data file came from. It seems like her data file contains the averages of the technical replicate peptide ADJNSAF values from this data file you generated:https://github.com/sr320/nb-2017/blob/master/C_gigas/data/ABACUS_output021417.tsv, which was also linked in this issue.

I found your posts from 2/13/2017 and 2/15/2017 but I'm unclear how ABACUS_output021417.tsv was made, or why the ADJNSAF values in that file would be different than those in the ABACUS_output.tsv file generated by Sean(and/or Rhonda) here despite their ABACUS parameters appearing the same as yours. Your source .mzXML files and their source files also appear the same based on file name and size.

@sr320 @emmats @kaitlynrm if this rings any bells or if there is a closed issue somewhere about this, please share here.

shellywanamaker commented 5 years ago

also NMDS plot is different depending on which ABACUS_output file is used

kaitlynrm commented 5 years ago

Those files that are posted and we discussed are the only ones I know of.

emmats commented 5 years ago

I remember that we couldn't figure out how she made the file. If you are up for it, it could be worth it to remake the file so that you know what you are working with.

sr320 commented 5 years ago

I can help with that- the focus now for Kaitlyn and Shelly should be to complete a methods section that describes the experiment and also presumably how data was generated. We can then follow the methods section to determine accuracy / reproduce. On Jan 10, 2019, 8:41 AM -0800, emmats notifications@github.com, wrote:

I remember that we couldn't figure out how she made the file. If you are up for it, it could be worth it to remake the file so that you know what you are working with. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

shellywanamaker commented 5 years ago

@sr320 here is the draft paper where @kaitlynrm and I are actively working on the methods section. And here is a more detailed description of the analysis part.

I also figured out what the discrepancies between the files were. Your ABACUS_output021417.tsv and Sean's ABACUS_output.tsv files were the same, but your ABACUS_output021417NSAF.tsv file actually contains NUMSPECADJ values, not NSAF values. And the data file that Kaitlyn and I have been working with contains average NUMSPECADJ values.

I'm going to start a new issue to clarify what Abacus output values we should be using for different analyses.