One of the reviewers asked if there was any evidence of conservation between some distal initiation only modifications in the Sorghum genome. Basically asking the questions - well are these genes exlusive to the Sorghum genome, and more recently shut off in the maize genome?
It's an interesting question, and one I was curious to do. The analysis steps are fairly simply.
1) Take the sequence underlying the initiation only modifications I identified in maize, and using BlastN blast these sequences to the Sorghum genomes
2) Ask the question - of these sequences - what proportion overlap genes in the Sorghum genome? This would tell us about potential genes which were conserved in Sorghum and lost in maize
2b) This will also require a control sequence as well. An equal number of initiation regions also identified in genes in the maize genome
3) What proportion of these regions found in the Sorghum genome also have the histone modificatios for initiation? NOTE: This will only really be able to be asked about the leaf modifications, as we don't have matched data sets.
Input BED Files
All following parallel commands will be run on this dataset
❯ ls -a1 *.bed
ear_initiation_overlapping_neither_elongation.bed
leaf_initiation_overlapping_neither_elongation.bed
root_initiation_overlapping_neither_elongation.bed
Generation of Control Datasets
We can't just do this analysis in a vacuum. To make this a more meaningful comparison, I'm going to be taking an equal sample of initiation modifications overlapping genes and compare the level of conservation (by means of counts) as compared to those initiation modification regions which are not overlapping genes. This might tell us that these initiation modification only regions are undergoing some sort of different evolutionary trajectory, thus maybe performing a different funciton.
Copied these over from local Location: ~/Projects/03.ncRNA_project/03.Figures/Figure2/05.Blastn_sequences to sapelo2 where the further analysis is happening in the directory: /scratch/jpm73279/04.lncRNA/08.comprative_maize_sorghum
5/5/2021
One of the reviewers asked if there was any evidence of conservation between some distal initiation only modifications in the Sorghum genome. Basically asking the questions - well are these genes exlusive to the Sorghum genome, and more recently shut off in the maize genome?
It's an interesting question, and one I was curious to do. The analysis steps are fairly simply.
1) Take the sequence underlying the initiation only modifications I identified in maize, and using BlastN blast these sequences to the Sorghum genomes 2) Ask the question - of these sequences - what proportion overlap genes in the Sorghum genome? This would tell us about potential genes which were conserved in Sorghum and lost in maize 2b) This will also require a control sequence as well. An equal number of initiation regions also identified in genes in the maize genome 3) What proportion of these regions found in the Sorghum genome also have the histone modificatios for initiation? NOTE: This will only really be able to be asked about the leaf modifications, as we don't have matched data sets.
Input BED Files
All following parallel commands will be run on this dataset
Generation of Control Datasets
We can't just do this analysis in a vacuum. To make this a more meaningful comparison, I'm going to be taking an equal sample of initiation modifications overlapping genes and compare the level of conservation (by means of counts) as compared to those initiation modification regions which are not overlapping genes. This might tell us that these initiation modification only regions are undergoing some sort of different evolutionary trajectory, thus maybe performing a different funciton.
Input Control Files:
Generation of Control Dataset: Made a very small bash file to assist in this called
equal_sub_sample.sh
which is comprised of these simple command:Command ran to generate:
Get underlying Nt Fasta
parallel "bedtools getfasta -fi Zea_mays.AGPv4.dna.toplevel.unwrapped.nocontigs.reorderd.fa -bed {} > {.}.seq.fa" ::: *.bed
Copied these over from local Location:
~/Projects/03.ncRNA_project/03.Figures/Figure2/05.Blastn_sequences
to sapelo2 where the further analysis is happening in the directory:/scratch/jpm73279/04.lncRNA/08.comprative_maize_sorghum
Blast Against Sorghum Genome
parallel "blastn -db Sorghum_bicolor_var_BTx623.mainGenome.fasta -query {} -outfmt 6 -num_threads 10 -evalue .0001 -max_target_seqs 1 > blast_output/{/.}.blastn_out.txt" ::: isolated_seq/*.fa
Grab the Sequence regions in Sorghum (Blast output format to Bed)
parallel "python fitler_blastn_maize_sog_comp_genBED.py -i {} -o seq_loc_sorghum_bed/{/.}.bed" ::: blast_output/*.txt
Intersect With a list of Sorghum Genes
parallel "bedtools intersect -a {} -b Sbicolorv5.1.gene.bed -u -wa > {.}.intersecting_sorghum_genes.bed " ::: *.blastn_out.bed
Count the number intersecting Sorghum genes"