bcgsc / goldrush

Linear-time de novo Long Read Assembler
GNU General Public License v3.0
35 stars 2 forks source link

Question about warning and output #124

Closed mylena-s closed 1 year ago

mylena-s commented 1 year ago

Hello! Thank you for developing this software, it runs smoothy!

I am writing to ask you two questions. First, I get the following warning many times in the same run (61 times to be exactly), with varying gap lengths

warning: scaffold gap is longer than 100 kbp: 162189 Is this expected? Should I change any parameters to deal with this? I am running the assembly with Nanopore reads (N50 reads 70Kb), and I have a some ultra long reads above 200 kb up to 800kb.

Second, the output folder is a little bit messy. I get many many files with similar names and really long ones. Also I get many symbolic links pointing to the same files, and symbolic links that point to other symbolic links. This is actually not a problem, but it is a little hard to know for certain which is the resulting file for assembly. For ex. the log tells that w16_x10_golden_path.goldrush-edit-polished.span2.dist500.tigmint.fa.k40.w250.ntLink-5rounds.fa is the final output. I moved it to another folder to tidy the files, and then I realized It was a symbolic link. So then I had to search the log to find which was the original file. I don't know if this is expected, it is a bug or something with my installation. But I hope it is useful information for the developing. image

Thanks again! Mylena

lcoombe commented 1 year ago

Hi Mylena,

Thanks for reaching out! I'm glad to hear that GoldRush is running smoothly for you.

For the scaffold gap sizes - no, I wouldn't be concerned about those warnings. Like you mention, it just means that there are some large gaps, but we'd expect those as distances that the ultra-long reads can scaffold over.

For the output folder, thanks for the feedback! You are right that there are a lot of intermediate files that most users wouldn't need to look at, and it can make your directory a bit messy. We'll discuss that point at a future developer's meeting and get back to you!

Thank you for your interest in GoldRush! Lauren

lcoombe commented 1 year ago

Hi @mylena-s - Just an update that we made some tweaks to GoldRush in #127 where the intermediate files end up in a newly created subfolder, and the final assembly files soft-linked to your working directory. Hopefully that helps with keeping the working directory where GoldRush is being run cleaner! That will be included in the next release.

mylena-s commented 1 year ago

Hi Lauren!

It is nice to hear that! I am curious on running the next release.

Thanks Mylena