Closed schorlton-bugseq closed 4 months ago
This was added in #18 because Prodigal not reporting the gene names in the GFF output was more annoying than anything. You can just ignore whatever ID
is in the FASTA output and just use the name of the FASTA record :+1:
Thanks for the quick response. There are downstream tools which rely on prodigal outputs, and specifically leveraging information from both fasta and gff using the ID
field to match records between files. Therefore I don't think this suggestion is optimal. Wouldn't a better solution be to also include sequence_id
in the ID field of the translated seqs? If you're going to make that design decision for the gff, you could imagine the pyrodigal fasta output from above looks like:
>contig_1_1 # 2009 # 2440 # -1 # ID=contig_1_1;partial=00;start_type=TTG;rbs_motif=GGAG/GAGG;rbs_spacer=5-10bp;gc_cont=0.488
(Note change in ID)
Thanks again for consideration!
I added support for this in v3.4.0
.
Pass full_id=True
as an argument to write_genes
, write_translations
and write_gff
to write either the full sequence name, or full_id=False
to write the sequence index. The default are True
for write_gff
and False
for write_translations
and write_genes
for backwards compatibility.
Thanks for the great tool! I was trying to use the outputs of pyrodial, and I think I ran into an issue where the
ID
between the translated seq and GFF do not match.I wrote my FASTA and GFF with:
listed_predictions
is a list of[(seq_id, pyrodigal.Genes), (seq_id2, pyrodigal.Genes2)]
In the FASTA I see:
However in the GFF I see:
Note the different
ID
s.When running prodigal, I see: FASTA:
GFF:
I suspect this is because of the use of
sequence_id
here: https://github.com/althonos/pyrodigal/blob/9887bb29ff39428b205ed63205e755f5209b16e8/pyrodigal/lib.pyx#L3633but
num_seq
here: https://github.com/althonos/pyrodigal/blob/9887bb29ff39428b205ed63205e755f5209b16e8/pyrodigal/lib.pyx#L3746Should these be the same? Let me know if I could do anything differently and thanks for your help!!