RobertsLab / resources

https://robertslab.github.io/resources/
19 stars 11 forks source link

Find rDNA Array in oysters #1304

Closed sr320 closed 1 year ago

mgavery commented 3 years ago

Filtered the Roslin gtf for "rRNA". There are 4 chromosomes with rRNA annotations.

filtered_GTF_Roslin_rRNA.txt rRNA_LOC.txt

mgavery commented 3 years ago

This is not consistent with Xu et al. (2001), which says 1 chrom, but still confusing to me. Most of the rRNA in the gtf are 5s rRNA and not sure what Xu et al. probed for? I don't think it was 5s. Why the heck is the nomenclature of this stuff so annoying!?!?! 01XurDNAFISHVeliger.pdf ?!

mgavery commented 3 years ago

Ok, I think I'm getting closer. The 5s is a separate from the major rDNA array - There is one chromosome (Chr 9) that has an annotation of "large", "5.8" and "small" rRNA. I think this is this one. filtered_GTF_Roslin_rRNA_txt_and_Find_rDNA_Array_in_oysters_·_Issue__1304_·_RobertsLab_resources .

mgavery commented 2 years ago

Towards getting single copy gene regions - GIles ran BUSCO (mollusca database)on the Roslin genome. One of the outputs is the genomic locations of single copy genes. I've attached the bed file of single copy regions (to use for normalization of copy number) as well as a short summary and the full table from bus single_copy_busco.mollusca.bed.txt short_summary.txt full_table.tsv.zip

mgavery commented 2 years ago

We have a start then for the location of the mito, rRNA and single copy genes. The next step would be to get ahold of a few WGS datasets for gigas, map to the genome + mito and take a look. Is there small variation in the single copy regions? What does copy # look like for mito and ribo?

mgavery commented 2 years ago

Giles mapped 20 WGS C.gigas samples (from NCBI/SRA) to the Roslin genome, then used bedtools "coverage" to get depth at each bp for the single copy genes identified using BUSCO. One of the 5087 genes was an outlier in terms of coverage, but after filtering that gene the coverage was pretty normally distributed across the 5086 genes.

This plot is showing mean depth of coverage for each of the 5086 genes per individual (ID is how they were ID'd in SRA. histo_meancovbygene

mgavery commented 2 years ago

box plot of same data BoxPlot_meancovbygene

mgavery commented 2 years ago

The next step will be to do the same thing for the mitochondrial genes and the ribosomal genes.

Ultimately, the single copy depth (mean across all genes) will be used as the denominator and either the ribo or mito depth will be the numerator. Then we can look at copy number variation of these regions across the 20 individuals.

sr320 commented 2 years ago

Should we go ahead and do with data in NCBI? Why limit target but do everything and normalize to single copy?

mgavery commented 2 years ago

There were 62 milllion basepairs covered by the 5087 single copy genes. These depth files per bp are really big. I think it's good to start with our hypothesis driven question (mito and ribo CNV), but might be totally useful to look at other regions. One strategy would be to peruse some bam files in a genome browser for places with crazy depth first and target those regions.

mgavery commented 2 years ago
Plot_Zoom
mgavery commented 1 year ago

@sr320 , can you provide access to bam files for WGBS data? I can run pipeline on them

sr320 commented 1 year ago

@yaaminiv can you point Mac to the bam files for your Haws data which I am presuming is gigas WGBS..

yaaminiv commented 1 year ago

@mgavery Can be found at this link! https://gannet.fish.washington.edu/spartina/project-oyster-oa/Haws/bismark-2/