Open fae75933 opened 4 years ago
A. CDS- 132 and non-CDS- 126 B. https://github.com/fae75933/BNIF8940/blob/master/Homework2 C. commit 0520f21d456b8d9ef8b314efc671f9fd5033489f
Let me know if you need more time and we can give you some extension for finishing it.
a) Not quiet, 132 and 126 are the file sizes for homework2CDS.text
and fhomework2nonCDS.text
. You gotta look into the content of those files (see example below). When you open the the file you can see it contains the summary statistics for the CDS subsequences ecoli_MG1655_CDS.fna
and non-CDS subsequences GCF_000005845_fnonCDS.fna
. The statistics include count and percentage for each nucleotide. Usually when we talk about GC composition we tend to calculate the total percentage of guanine (G) and cytosine (C) out of all the nucleotide in the region (see https://en.wikipedia.org/wiki/GC-content#:~:text=In%20molecular%20biology%20and%20genetics,)%20or%20cytosine%20(C).). In this case the total GC composition for CDS region is 0.5183 (0.2586+0.2597) and for non-CDS region is 0.4314 (0.2173+ 0.2141). I will give you partial points cause you wrote the entire workflow correctly and generated files that provide correct count and percentage for G and C for those two regions, but make sure you understand my explanations. Let me know if it doesn't make sense.
more homework2CDS.text
#seq len A C G T N cpg
total 4641652 1142742 1180091 1177437 1141382 0 346793
prcnt 1.0 0.2462 0.2542 0.2537 0.2459 0.0000 0.0747
more fhomework2nonCDS.text
[sh60271@teach-sub1 hw2]$ more fhomework2nonCDS.text
#seq len A C G T N cpg
total 562358 160229 120400 122221 159508 0 28053
prcnt 1.0 0.2849 0.2141 0.2173 0.2836 0.0000 0.0499
My goal for this homework is to develop on my skills of using Bash as well as add additional bioinformatics tools to my toolkit.