Open Leonidus1995 opened 8 months ago
Question | Points | Notes | Points Possible |
---|---|---|---|
1 | 4 | L50 is minimum number of (biggest-length) contigs required to 50%. | 4 |
2 | 3 | -1: Descriptions of the plots themselves -- the canu plot shows a circular assembly in a single contig (straight line that wraps across the top/bottom of the chart; single contig labelled on y axis) while the spades plot shows a highly fragmented but correct assembly (straight line, but many contigs labelled on y axis). The parallel lines are not due to repeats in this case, they are due to a circular genome where the starting index is different between the two assemblies. |
4 |
3 | 1 | 1 | |
4 | 1 | 1 |
1). Report the N50 and L50 for both assemblies and state what these values mean. Answer: The N50 for PacBio long read assembly and Illumina short read assembly is 4665109 and 133353 respectively. The L50 for PacBio long read assembly and Illumina short read assembly is 1 and 13 respectively. While assembling the contigs by size, i.e., from biggest contig to smallest, the length of the contig at middle or when you finish 50% of the genome is called as N50. On the other hand, L50 tells you the number of biggest length contigs that are required to construct 50% of the total genome size.
2). Upload .pngs of mummerplots for both assemblies and describe what these plots show. Answer: The mummer plots are used for visualization and whole genome alignment with a reference genome. This is done so as to compare the two genomes for contiguity and completeness of our assembled genome. Highly contiguous assembly with fewer or no fragmentation is usually represented as straight diagonal line in mummer plot. The multiple parallel diagonal lines may indicate repeatitive elements in the genome.
3). The URL to the location of the script on GitHub. Answer: https://github.com/Leonidus1995/GENE8940/blob/main/Homework_3.sh
4). The git SHA revision of the script used for this analysis. Answer: 0d5f5c1a80afad63b07b19b18dffef1395b4ec06