Jome0169 / MendietaPablo_Annotation_Paper_scripts

Scripts used for analysis and pipline of Mendieta et al 2020
1 stars 0 forks source link

Conservation of Initiation Modification Class in Sorghum #4

Closed Jome0169 closed 3 years ago

Jome0169 commented 3 years ago

5/5/2021

One of the reviewers asked if there was any evidence of conservation between some distal initiation only modifications in the Sorghum genome. Basically asking the questions - well are these genes exlusive to the Sorghum genome, and more recently shut off in the maize genome?

It's an interesting question, and one I was curious to do. The analysis steps are fairly simply.

1) Take the sequence underlying the initiation only modifications I identified in maize, and using BlastN blast these sequences to the Sorghum genomes 2) Ask the question - of these sequences - what proportion overlap genes in the Sorghum genome? This would tell us about potential genes which were conserved in Sorghum and lost in maize 2b) This will also require a control sequence as well. An equal number of initiation regions also identified in genes in the maize genome 3) What proportion of these regions found in the Sorghum genome also have the histone modificatios for initiation? NOTE: This will only really be able to be asked about the leaf modifications, as we don't have matched data sets.

Input BED Files

All following parallel commands will be run on this dataset

❯ ls -a1 *.bed
ear_initiation_overlapping_neither_elongation.bed
leaf_initiation_overlapping_neither_elongation.bed
root_initiation_overlapping_neither_elongation.bed

Generation of Control Datasets

We can't just do this analysis in a vacuum. To make this a more meaningful comparison, I'm going to be taking an equal sample of initiation modifications overlapping genes and compare the level of conservation (by means of counts) as compared to those initiation modification regions which are not overlapping genes. This might tell us that these initiation modification only regions are undergoing some sort of different evolutionary trajectory, thus maybe performing a different funciton.

Input Control Files:

~/Projects/03.ncRNA_project/03.Figures/Figure2/05.Blastn_sequences
❯ wc -l *intersecting_genes.bed
   20941 ear_initiation.intersecting_genes.bed
   19724 leaf_initiation.intersecting_genes.bed
   23387 root_initiation.intersecting_genes.bed
   64052 total

Generation of Control Dataset: Made a very small bash file to assist in this called equal_sub_sample.sh which is comprised of these simple command:

set -euxo pipefail

gen_number=$(wc -l < ${1})
shuf -n ${gen_number} ${2}

Command ran to generate:

❯ parallel "bash equal_sub_sample.sh {}_initiation_overlapping_neither_elongation.bed {}_initiation.intersecting_genes.bed > {}_initiation.intersecting_genes.control_sub_sample.bed" ::: ear leaf root
++ wc -l
+ gen_number='    2340'
+ shuf -n 2340 ear_initiation.intersecting_genes.bed
++ wc -l
+ gen_number='    2605'
+ shuf -n 2605 leaf_initiation.intersecting_genes.bed
++ wc -l
+ gen_number='    3486'
+ shuf -n 3486 root_initiation.intersecting_genes.bed

Get underlying Nt Fasta

parallel "bedtools getfasta -fi Zea_mays.AGPv4.dna.toplevel.unwrapped.nocontigs.reorderd.fa -bed {} > {.}.seq.fa" ::: *.bed

Copied these over from local Location: ~/Projects/03.ncRNA_project/03.Figures/Figure2/05.Blastn_sequences to sapelo2 where the further analysis is happening in the directory: /scratch/jpm73279/04.lncRNA/08.comprative_maize_sorghum

Blast Against Sorghum Genome

parallel "blastn -db Sorghum_bicolor_var_BTx623.mainGenome.fasta -query {} -outfmt 6 -num_threads 10 -evalue .0001 -max_target_seqs 1 > blast_output/{/.}.blastn_out.txt" ::: isolated_seq/*.fa

Grab the Sequence regions in Sorghum (Blast output format to Bed)

parallel "python fitler_blastn_maize_sog_comp_genBED.py -i {} -o seq_loc_sorghum_bed/{/.}.bed" ::: blast_output/*.txt

Intersect With a list of Sorghum Genes

parallel "bedtools intersect -a {} -b Sbicolorv5.1.gene.bed -u -wa > {.}.intersecting_sorghum_genes.bed " ::: *.blastn_out.bed

Count the number intersecting Sorghum genes"


  2077 ear_initiation.intersecting_genes.control_sub_sample.seq.blastn_out.intersecting_sorghum_genes.bed
   201 ear_initiation_overlapping_neither_elongation.seq.blastn_out.intersecting_sorghum_genes.bed
  2318 leaf_initiation.intersecting_genes.control_sub_sample.seq.blastn_out.intersecting_sorghum_genes.bed
   178 leaf_initiation_overlapping_neither_elongation.seq.blastn_out.intersecting_sorghum_genes.bed
  3125 root_initiation.intersecting_genes.control_sub_sample.seq.blastn_out.intersecting_sorghum_genes.bed
   287 root_initiation_overlapping_neither_elongation.seq.blastn_out.intersecting_sorghum_genes.bed
  8186 total```

#### Call Initiation Only Mod Peaks