fae75933 / BNIF8940

0 stars 0 forks source link

Homework 2 #2

Open fae75933 opened 4 years ago

fae75933 commented 4 years ago

My goal for this homework is to develop on my skills of using Bash as well as add additional bioinformatics tools to my toolkit.

fae75933 commented 4 years ago

A. CDS- 132 and non-CDS- 126 B. https://github.com/fae75933/BNIF8940/blob/master/Homework2 C. commit 0520f21d456b8d9ef8b314efc671f9fd5033489f

shunhuahan commented 4 years ago

Let me know if you need more time and we can give you some extension for finishing it.

shunhuahan commented 4 years ago

a) Not quiet, 132 and 126 are the file sizes for homework2CDS.text and fhomework2nonCDS.text. You gotta look into the content of those files (see example below). When you open the the file you can see it contains the summary statistics for the CDS subsequences ecoli_MG1655_CDS.fna and non-CDS subsequences GCF_000005845_fnonCDS.fna. The statistics include count and percentage for each nucleotide. Usually when we talk about GC composition we tend to calculate the total percentage of guanine (G) and cytosine (C) out of all the nucleotide in the region (see https://en.wikipedia.org/wiki/GC-content#:~:text=In%20molecular%20biology%20and%20genetics,)%20or%20cytosine%20(C).). In this case the total GC composition for CDS region is 0.5183 (0.2586+0.2597) and for non-CDS region is 0.4314 (0.2173+ 0.2141). I will give you partial points cause you wrote the entire workflow correctly and generated files that provide correct count and percentage for G and C for those two regions, but make sure you understand my explanations. Let me know if it doesn't make sense.

more homework2CDS.text
#seq    len A   C   G   T   N   cpg
total   4641652 1142742 1180091 1177437 1141382 0   346793
prcnt   1.0     0.2462  0.2542  0.2537  0.2459  0.0000  0.0747
more fhomework2nonCDS.text

[sh60271@teach-sub1 hw2]$ more fhomework2nonCDS.text
#seq    len A   C   G   T   N   cpg
total   562358  160229  120400  122221  159508  0   28053
prcnt   1.0     0.2849  0.2141  0.2173  0.2836  0.0000  0.0499