should i add a figure for pseudogenes?

For genomes that had at least species-level representatives in GTDB, the largest source of error was non-coding reads being predicted as coding (Figure @fig:orpheum_fig A). We hypothesized that these reads originated from pseudogenes as these sequences would likely not be annotated as coding in the genomes from which the reads were simulated from, but may retain some k-mers contained in the database. To assess this hypothesis, we used annotation files produced by the NCBI Prokaryotic Genome Annotation Pipeline (PGAP), which annotates pseudogenes, for the 23 genomes for which these files were available [@doi:10.1093/nar/gkw569; @doi:10.1093/nar/gkaa1105]. On average, 12.4% (SD = 13.8%) of non-coding reads that were predicted to be coding fell within pseudogenes annotated by the PGAP pipeline.

olga commented: Is there a figure for noncoding reads in pseudogenes?

dib-lab / 2021-paper-metapangenomes

should i add a figure for pseudogenes? #14