RobertsLab / resources

https://robertslab.github.io/resources/
18 stars 10 forks source link

Where did the oyster proteome come from? #301

Closed yaaminiv closed 6 years ago

yaaminiv commented 6 years ago

Here's a link to the oyster proteome file I used for the DNR paper: http://owl.fish.washington.edu/halfshell/bu-git-repos/nb-2017/C_gigas/data/Cg_Gigaton_proteins.fa

Where did it come from? I determined that it did not come from Fang et al. (see this issue).

sr320 commented 6 years ago

What did you use it for?

Provide more info ie output from head, sequence name format, # of sequences.

On Wed, Jun 20, 2018 at 1:40 PM Yaamini Venkataraman < notifications@github.com> wrote:

Here's a link to the oyster proteome file I used for the DNR paper: http://owl.fish.washington.edu/halfshell/bu-git-repos/nb-2017/C_gigas/data/Cg_Gigaton_proteins.fa

Where did it come from? I determined that it did not come from Fang et al. (see this issue https://github.com/RobertsLab/resources/issues/300).

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/RobertsLab/resources/issues/301, or mute the thread https://github.com/notifications/unsubscribe-auth/AEPHtz7-G7LLD3WF5R0Yyfj6RZxB9ZFRks5t-rMdgaJpZM4Uv83Y .

yaaminiv commented 6 years ago

The file was used in the protein digest simulator (see this nb).

Output from head:

head

Not entirely sure about the sequence name format, but they all start with CHOYP

Number of sequences:

sequences

sr320 commented 6 years ago

@emmats Why would she have used this proteome (likely derived from the translation of a transcriptome assembled with Trinity), as opposed to the canonical proteome published with the genome?

emmats commented 6 years ago

I have no idea! I think I asked her to ask you which one to use, although I don't remember exactly. I'm pretty sure I wasn't part of that particular decision-making process. If I was, I didn't make note of it and don't remember.

sr320 commented 6 years ago

clue!

This Pacific oyster (Crassostrea gigas) dataset is from Gigaton. It was downloaded 10/13/2016 at 5:16PM

!curl http://gigaton.sigenae.org/ngspipelines/data/dcc6581978/analysis/f78867df95/contigs.fasta.transdecoder.pep.gz \
> contigs.fasta.transdecoder.pep.gz

found @ https://github.com/Ellior2/Fish-546-Bioinformatics/blob/master/notebooks/gigas_prot/000-data-upload.ipynb

sr320 commented 6 years ago

@yaaminiv do you recall why this is relevant?

yaaminiv commented 6 years ago

@sr320 We wanted to cite the proteome source in the DNR paper