Closed desmodus1984 closed 3 weeks ago
Hi @desmodus1984,
Thank you for your interest in ntJoin! Lauren
Hi Lauren,
Thanks for the quick reply. It would be very nice to see a comparison between ntJoin and the new Ragtag. You are right, that's my set-up, 6 reference grade genomes and one scaffolded draft.
Thanks.
Hi @desmodus1984,
Ok, thanks for clarifying!
There are a couple of considerations that you could make here:
cut
option - it if is set to True
, it means that it will use the structure of the reference assembly (or assemblies) and make cuts in your input assembly to fit the structure to the reference(s). If False
, it will still scaffold, but will not break any of the scaffolds. You could do that if you think there are species-specific structures in your assembly that you want to retain.Let me know if you have any other questions! Lauren
Hi, I wanted to ask you something. Do you know a way to eliminate the zero-depth regions? For me it doesn't make any sense them, they are false-positives, with like 100X depth, there should be no region with zero-depth.
Thanks
Hi @desmodus1984,
For the zero-depth regions, unfortunately you couldn't use ntJoin directly because it doesn't use any read information in the execution. You would expect full contigs with zero coverage to have no mappings to any reference, and end up in the 'unassigned' fasta (although legitimate contigs can also end up in this file)
I'm not familiar with pomoxis for assembly, but it is indeed strange to have zero coverage regions if you are re-aligning the same reads. If it were me, I would start with characterizing these regions - what proportion of the assembly do they comprise? Are they in individual contigs, or smaller regions within the contigs? Are they near any gaps? Are you filtering your alignments at all, or using more conservative parameters? This information could help you understand how substantial these regions are within your assembly. Depending on the results, you could consider removing these contigs, or masking those regions - but I would definitely ensure you fully understand those regions first!
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your interest in ntJoin!
Hi, I used short and long-reads to assemble a bat genome. I used pomoxis to assemble it using a reference, and I got 98 contigs. Busco score was 97%, I then mapped the reads back to the assembly, many contigs had high coverage, but some had very low (50 -60%) which signed a problem, as well as several regions with no depth. Then, I tried detecting misassemblies with CRAQ, and it found many as suggested by the mapped reads.
A group published 6 reference bat genomes, and I wanted to consult you about setting up the analysis. The main reference-grade assemblies are 6, so I wanted to know if I should give them all the same weight, or if I should give them different weights, since they are from different families, hence more distantly related to my species.
Also, I tried running ragtag because another group sequenced my same species, but didn't publish the assembly. My installation in a server didn't work when I tried using it, but later I found that https://usegalaxy.eu/ had ragtag, so I used it to correct my assembly. I have two questions,
Lastly, I am planning on running it on a HPC, and I would appreciate if you could tell me how much memory ntJoin needs, and how many cores should I assign to the job. Any hints and suggestions to improve my genome assembly are highly appreciated.
Thanks;