RobertsLab / resources

https://robertslab.github.io/resources/
18 stars 10 forks source link

Re-run geoduck repro primer design pipeline #974

Closed shellywanamaker closed 4 years ago

shellywanamaker commented 4 years ago

@kubu4 can you please re-run the primer design pipeline (this time including EMBOSS) so that we have reproducible documentation of the software and settings used, and so we can check the specificity of the primers?

Fasta files are here: http://owl.fish.washington.edu/kaitlyn/202001-geoduck_reproductive_dev_primers/

kubu4 commented 4 years ago

That link doesn't have FastA files, just her outputs from Primer3. I'll work on tracking down the original FastAs, but if you know where they are, please feel free to drop a link in here to save me some time. Thanks! I'll report back if/when I track them down.

kubu4 commented 4 years ago

Think I found what I needed (sequence names and link to FastA) here: https://github.com/RobertsLab/resources/issues/822#issuecomment-572313717

shellywanamaker commented 4 years ago

Great! Were they somewhere here: /home/sam/data/geoduck/transcriptomes/transdecoder_fasta_splits/ ?

kubu4 commented 4 years ago

No. I linked to the comment with their locations. Looks like she (and Steven) used a genes FastA file hosted in the OSF repo.

shellywanamaker commented 4 years ago

Gotcha. Thanks for tracking that down

kubu4 commented 4 years ago

Alrighty, re-ran the pipeline. Here's a summary table of primer set matches to any sequences in the genes FastA file.

More specifics on how this was run are in the Jupyter Notebook and my Notebook (linked at bottom of post).

Note: The number of matches should be divided by two. Reason is related to how I counted (using grep).

SeqID PrimerName Matches
PGEN_.00g025890-vv0.74.a TIF3s12 2
PGEN_.00g070040-vv0.74.a APLP 2
PGEN_.00g188130-vv0.74.a FEN1 2
PGEN_.00g194630-vv0.74.a ECHD3 2
PGEN_.00g338640-vv0.74.a NSF 2
PGEN_.00g288180-vv0.74.a TIF3s4a 4
PGEN_.00g245080-vv0.74.a TIF3s10 8
PGEN_.00g132030-vv0.74.a TIF3s8-1 10
PGEN_.00g079690-vv0.74.a TIF3s7 14
PGEN_.00g088260-vv0.74.a NFIP1 36
PGEN_.00g224740-vv0.74.a GLYG 46
PGEN_.00g280110-vv0.74.a SPTN1 496
PGEN_.00g082590-vv0.74.a TIF3s5 742
PGEN_.00g287540-vv0.74.a RPL5 2570
PGEN_.00g132040-vv0.74.a TIF3s8-2 7800
PGEN_.00g114060-vv0.74.a GSK3B 8596
PGEN_.00g000750-vv0.74.a TIF3s6b 15512

Jupyter Notebook:


Notebook:

shellywanamaker commented 4 years ago

@kubu4 awesome! Based on these results for the reproductive development primers, do you suggest going for just APLP since NFIP1 seems to have multiple targets? Do we know what these targets are?

kubu4 commented 4 years ago

do you suggest going for just APLP since NFIP1 seems to have multiple targets?

Yep.

Do we know what these targets are?

Technically, yes. Is that info readily available? Sort of. It would just require some leg work:

  1. Look at the EMBOSS primersearch output file to identify realistic, potential qPCR amplicons (e.g. < 300bp).

  2. Use sequence ID from potential targets to search Panopea-generosa-genes-annotations.tab

Another thing to keep in mind is that I allowed up to 20% mismatch when checking the primers' specificity in silico. So, there's probably some wiggle room to tweak qPCR stuff (e.g. increase annealing temp, decrease [Mg2+] ) that would help increase primer annealing specificity in vitro. Is it worth the effort(s)? Probably not; unless you're really interested in that particular target.

shellywanamaker commented 4 years ago

Gotcha. I think we can just move ahead with APLP, and if we need to go back for to the drawing board for some reason we could revisit this for NFIP1.

For the expression control, would you say TIF3s8-1 is still the best candidate because the 4 other potential targets are > 6KB and won't amplify?

kubu4 commented 4 years ago

Yes, it's still the best candidate because the qPCR works well:

https://github.com/RobertsLab/resources/issues/970#issuecomment-665796790