Edinburgh-Genome-Foundry / DnaFeaturesViewer

:eye: Python library to plot DNA sequence features (e.g. from Genbank files)
https://edinburgh-genome-foundry.github.io/DnaFeaturesViewer/
MIT License
584 stars 90 forks source link

truncated scaffolds from gff files #70

Open simarilion opened 1 year ago

simarilion commented 1 year ago

I'm sure I'm missing something obvious here.....

I'd like to view a long (583,249 bp long) but feature-poor scaffold (7 features). These features are described in the attached gff file which specifies the full length of the scaffold on the second line as:

sequence-region scaffold59 1 583249

I can generate output files OK, but they don't display the entire length of the scaffold - they always stop at the end of the last feature (position 483,172), i.e. missing about 100,000 bp of straight black line (example output also attached).

Here's an example of one of my attempts to use dan_feature_viewer to get this working:


from dna_features_viewer import BiopythonTranslator from BCBio import GFF in_file = "scaffold59_original.gff"

in_handle = open(in_file) gff_records = [] for rec in GFF.parse(in_handle):

print(rec)

gff_records.append(rec)

in_handle.close()

fig, ax = plt.subplots()

start = 0 end = 583249 record = BiopythonTranslator().translate_record(gffrecords[0][start:end]) ax, = record.plot(figure_width=10, strand_in_label_threshold=7) ax.figure.savefig("dna_FV_scaffold59_original_test2.png", bbox_inches="tight")


scaffold59_original_gff.txt

dna_FV_scaffold59.pdf

simarilion commented 1 year ago

Kind of figured a workaround out - if one includes the fasta formatted sequence at the end of the gff file the entire length gets plotted, so after the last gff feature annotation include the following:

FASTA

scaffold59 CAGTAAATTTGACTGGGTCCAAACATAGCTGGTATCACATATATTTTGCGATACCTACACACGTGATGATGATTGATTTC ACACATCTTTAGCAAAATAAGGatttagtttgttatttaaatttgtttattttcattgtgactataatttatttttttcg ta.....

Would be nice if this wasn't necessary as this information is present in the second line of the off file (##sequence-region scaffold59 1 583249)....

Otherwise this is a great package - thanks for providing it to the community!

Zulko commented 1 year ago

Just in case: DnaFeatureViewer normally uses the length given by the biopython record. Maybe this is a case where the record imported by GFF doesn't have the right size (meaning the problem is not with DFV). What does len(records[0]) print?

In any case, another workaround to your fasta solution is to manually give the sequence length:

record = BiopythonTranslator().translate_record(gff_records[0])
record.sequence_length = 583249
simarilion commented 1 year ago

Thanks for your answer Zulko!

I think the length in the gff file is correct, but I like your "record.sequence_length = 583249” solution - much more efficient!

Cheers Dan


Professor Daniel J. Jackson Department of Geobiology Room 2.124 Georg-August University of Göttingen Goldschmidtstr.3 37077 Göttingen Germany

Tel: +49 (0) 551 39 14177 Fax: +49 (0) 551 39 7918

@.**@.> http://www.uni-goettingen.de/en/102705.html


On 10. Oct 2022, at 16:24, Zulko @.**@.>> wrote:

Just in case: DnaFeatureViewer normally uses the length given by the biopython record. Maybe this is a case where the record imported by GFF doesn't have the right size (meaning the problem is not with DFV). What does len(records[0]) print?

In any case, another workaround to your fasta solution is to manually give the sequence length:

record = BiopythonTranslator().translate_record(gff_records[0]) record.sequence_length = 583249

— Reply to this email directly, view it on GitHubhttps://github.com/Edinburgh-Genome-Foundry/DnaFeaturesViewer/issues/70#issuecomment-1273396510, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AENVHPM7QLVHJMS4KG3JHG3WCQRI3ANCNFSM6AAAAAARA3OANU. You are receiving this because you authored the thread.Message ID: @.***>

Zulko commented 1 year ago

Great! To be clear, I wasn't suggesting that the length in the file is incorrect, but that the record produced by BCBio.GFF has the incorrect size, which would mean the issue is with the BCBio library and DnaFeatureViewer isn't at fault.

Could you check the value returned by len(record[0])?