asl / BandageNG

a Bioinformatics Application for Navigating De novo Assembly Graphs Easily
GNU General Public License v3.0
114 stars 10 forks source link

Incorrect data loaded from CSV file #122

Closed rlorigro closed 1 year ago

rlorigro commented 1 year ago

The same GFA and CSV produce different results in OG vs NG Bandage:

Bandage NG image

Original Bandage image

CSV: https://rlorigro-public-files.s3.us-west-1.amazonaws.com/gfase/test_gfa/chainable_nodes.csv

GFA: https://rlorigro-public-files.s3.us-west-1.amazonaws.com/gfase/test_gfa/chain_test.gfa

asl commented 1 year ago

Ok, this is an outcome how we're matching path names in CSV. Path names are matched by prefix, so in your particular case node name t matched against paths tip.0 and tip.1. I do not recall why we matched paths before nodes. Current workaround is to rename / remove paths.

I will revise the logic here.

asl commented 1 year ago

Ah, yes, I remember why we're checking paths first. Unfortunately, when exporting to FASTA Bandage exports node names in the format like NODE_6+_length_50434_cov_42.3615 which obviously clashes with the standard naming of paths in e.g. SPAdes assembly graphs.

While querying for node name it transforms NODE_6+_length_50434_cov_42.3615 into just 6+. This logic obviously could transform path name into some valid node name and do some wrong things.

Probably, the only sane solution is to require explicit path name, not just prefix check.

rlorigro commented 1 year ago

Hey @asl could this potentially be added to a new release? Perhaps you could switch from monthly to a yearly release schedule :)

asl commented 1 year ago

@rlorigro We're having rolling releases these days: https://github.com/asl/BandageNG/releases/tag/continuous

So, you could always grab the latest snapshot