Closed RNieuwenhuis closed 11 months ago
Hi @RNieuwenhuis,
There are a lot of factors that could contribute to a join not being made, so it's honestly pretty hard for me to say for sure. Some possibilities:
It's really hard to say if any filtering would have lead to the expected output. Perhaps if you had made the c
parameter less stringent (lower), there would have been more barcodes that supported this, but it's hard to say. If you wanted to know for sure, you'd need to do a failure mode analysis, where you looked into the read alignments, and decode from there why barcode support wasn't found.
Hi @lcoombe
Thanks for your reply, using -m 20-10000 -c 3 -e 100000
and now it connects the scaffolds that I know should be connected and a lot more shared_barcodes
are found.
U V Best_orientation Shared_barcodes U_barcodes V_barcodes All_barcodes
16- 60+ F 33 9395 9563 807069
60- 16+ F 33 9563 9395 807069
16- 60- F 11 9395 3482 807069
60+ 16+ F 11 3482 9395 807069
16+ 60+ F 15 6865 9563 807069
60- 16- F 15 9563 6865 807069
16+ 60- T 75 6865 3482 807069
60+ 16- T 75 3482 6865 807069
I understand that these are considered loose settings and that default settings are usually the most sensible ones.
What I still don't understand is why there are now a lot more orientations listed compared to the previous run. Could you maybe elaborate a bit on the reasons for that, please? Is it arcs or LINKS that causes that? Is it related to the -a
setting for links? Or -l
, maybe?
why there are now a lot more orientations listed compared to the previous run
Do you mean more links or more different orientations? You can see that only one relative orientation is seen as the 'best' - which is 16+ -> 60- (60+ -> 16- is the equivalent due to it being the reverse complement).
Figure 1 of our ARCS manuscript (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6030987/) has a schematic of the algorithm which might help you to better understand what's going on.
For deciding on the orientation supported by a given barcode, the code tallies up the number of read pairs for that barcode which map to the head (5' end) or tail (3' end) of a contig. It does a significance test to determine if the reads are more significantly found at the head or tail end, and assigns the orientation of that contig accordingly. Once all of these are tallied, there is another test to decide which of the various orientations found between a given pair of contigs, which is the most supported by the barcode data. Note that although you see all the combinations in that verbose file that you are looking at, only the best supported orientation will be found in the scaffold graph, which is traversed to output the final assembly scaffolds.
The creation of the scaffold graph is done at the arcs
stage, so prior to LINKS. The links found will be related to all of the parameters that are input to this arcs
stage.
This issue has been automatically marked as stale because it has not had any recent activity. It will be closed if no further activity occurs. Thank you for your interest in ARCS!
Hi @lcoombe,
I was just comparing some older work I have done some years ago to a newly published result. Using PacBio, Bionano and 10X we managed to get our assembly mostly to chromosome level and were very happy with that. There were still a hand full of scaffolds not yet complete chromosomes.
I aligned my result to a newly published result and got the feeling that these scaffolds that were not yet complete chromosomes should have been scaffolded back in the days when I used ARCS for the linked-read scaffolding.
Now I went back to my results and I am trying to find out why that did not happen, what parameters may have caused the link to be filtered out. Just for the sake of learning.
So, based on the dot-plot I know which scaffolds should have been possible to connect and in what orientation that would be possible. I know my scaffold X and Y got number 16 and 60 in the renaming step.
Now in
my_assembly_prefix_c5_m50-10000_s98_r0.05_e30000_z500_main.tsv
I find the following:Could you please explain why the combination
16+ 60-
and60+ 16-
are not listed, as those are now confirmed to be the correct relative orientations? I see for other combinations of scaffolds more orientations listed. Is it filtering step that I could have tweaked back then that causes this?