terminate called after throwing an instance of 'std::out_of_range' what(): vector::_M_range_check: __n (which is 0) >= this->size() (which is 0) Aborted
Put transcript ID in the gff file by taking it from the ID in the attribute column in R
Load libraries
library(tidyverse)
pdam.gff <- read.csv(file="~/Desktop/pdam_annotation.gff3", header=FALSE, sep="\t", skip=2)
#rename columns
colnames(pdam.gff) <- c("scaffold", "Gene.Predict", "id", "gene.start","gene.stop", "pos1", "pos2","pos3", "gene")
# seeing what kinds of components are in annotation
unique(pdam.gff$id)
# Making ref_gene_id, moving gene id into separate column
pdam.gff$gene_id <- sub(";.*", "", pdam.gff$gene)
pdam.gff$gene_id <- gsub("ID=", "transcript_id=", pdam.gff$gene_id) #remove ID= and replace with transcript_id=
# or could do: pdam.gff$gene_id <- gsub("ID=", "gene_id=", pdam.gff$gene_id) #remove ID= and replace with gene_id=
head(pdam.gff)
# paste gene id back into gene column
pdam.gff$gene <- paste(pdam.gff$gene, pdam.gff$gene_id)
head(pdam.gff$gene)
# remove white spaces
pdam.gff$gene <- gsub(" ", "", pdam.gff$gene, fixed = TRUE) #remove white space
head(pdam.gff)
# Remove last column
pdam.gff <- pdam.gff[,-10]
#save file
write.table(pdam.gff, file="~/Desktop/pdam_annotation_AddTranscript_id_fixed.gtf", sep="\t", col.names = FALSE, row.names=FALSE, quote=FALSE)
Attempting to build genome index so that genes can be mapped to reference genome for Pdamicornis. Reference genome and gff file from Reef Genomics
Did not work with this gff, gave me this error:
Put transcript ID in the gff file by taking it from the ID in the attribute column in R
This fixed my issue