combogenomics / medusa

A draft genome scaffolder that uses multiple reference genomes in a graph-based approach.
http://combo.dbe.unifi.it/medusa/
GNU General Public License v3.0
42 stars 15 forks source link

Restarting failed anchoring #31

Closed LZarri closed 3 years ago

LZarri commented 3 years ago

Hello,

My anchoring failed after the software detected non-reference genomes in the reference genome folder. How do I restart and keep using the xxx.delta and xxx.coords files?

Thanks, Liam

nbedelman commented 3 years ago

I have the same issue - mummer completed (after 8 days), and I'd like to continue from this point. Any suggestions about how to do this?

Thanks Nate

LZarri commented 3 years ago

I have found that, upon re-starting, it will overwrite the coords and delta file. What I decided to do was copy them to a different folder for later useage. If you are just looking to anchor reads, the coords file can be used to order the visualization of reads on Fst plots of Fst or Tajima's D. I am working on this now - happy to send code if it is helpful

EBosi commented 3 years ago

Hi, I will push an update to allow starting from mummer output, let me know how it goes when it's up

nbedelman commented 3 years ago

@LZarri that sounds really cool! Yes, I would totally be interested if you don't mind sharing. @EBosi thanks so much! Is this update available on conda?

LZarri commented 3 years ago

See below for R code to parse the coords file - the issue I had was that there are several regions of decreasing similarity where query read fits into the reference. But at least this is something to start with! For more info on the headers I used, see https://github.com/mummer4/mummer

bs <- read_delim("../genome/medusa_output/smb_50x_GCF_014851395.coords", skip = 5, delim= " ",col_names = c('S_Q', 'E_Q', NA, 'S_R', 'E_R', NA, 'LEN_1', 'LEN_2', NA, 'PERC_IDY', NA, 'LEN_Q', 'LEN_R', NA, 'COV_Q', 'COV_R', NA, 'TAGS')) %>% select(-c(X3, X6, X9, X11, X14, X17)) %>% mutate(PERC_IDY = round(as.numeric(PERC_IDY),0)) %>% filter(PERC_IDY > 80) %>% separate(TAGS, c('QUE_ID', 'REF_ID'), sep = '\t')

nbedelman commented 3 years ago

Very cool, thanks, I'll give it a try!

EBosi commented 3 years ago

Hi, I've pushed the commit by @gianlucacolotto in the main branch, let me know how it goes, I will close the issue for now.