Closed ggstatgen closed 5 years ago
Hello,
You would simply need to adjust the co-ordinates of the gff above to relative to the insertion sequence.
For example, if the transcript defined above started at position 1 (the first nucleotide) of your inserted sequence, the first 2 gff lines would look like this:
chr2 ENSEMBL transcript 1 21849 . + .
chr2 ENSEMBL exon 1 137 . + .
Hi guys
Thanks for developing reform - sounds like an awesome tool. I have just stumbled on it and am trying to figure out if this could be what I need for an analysis I need to perform.
Essentially, I need to create a modified mouse chromosome where the exon of a gene has been replaced by a stop cassette to knock out the gene. The insertion will contain additional sequence on the 5' and 3' prime end of the stop cassette. I do have the full insertion sequence in fasta.
My purpose is to obtain a 'custom' mouse chromosome which includes the above deactivated gene sequence. It seems your tool is ideal for doing this, however I'm a bit unclear on the meaning of one of the arguments you request in order for the program to run, namely
--in_gff
. What should this file contain in my case?In my understanding, if my insertion sequence was, say, 3Mb long and contained several genes, the gff would contain the absolute coordinates of the genes/exons/transcripts/TSSs/TTSs in this 3MB fasta sequence (where by absolute I mean the first nucleotide in the inserted sequence is at position 0).
Here's a concrete example. Let's say the novel sequence to insert contains only one gene, for example Pax6, described by the Gencode gff3 catalogue as follows
(note I'm only showing the first 8 columns of the gff for clarity here).
Given the above, how would I go about creating a suitable gff input file for reform? Would I need to use a tool to manually annotate the exons/UTRs in my novel fasta (eg MAKER) and pass the resulting gff to reform? Or something else entirely? Apologies if I'm missing something obvious.