Open sihellem opened 2 years ago
Hello. I think the problem is very likely to be the size of the file and the number of baits created. This number of loci seems pretty high - it would probably be best to try and reduce this number in some way before designing baits to capture them. Typically, I design baits targeting 1000-5000 loci.
Hi Brant,
First, thanks for this amazing resource and accompanying tutorials, they are really helpful.
I am now stuck at the removal of duplicates using 'phyluce_probe_remove_duplicate_hits_from_probes_using_lastz'. After 1-2 days, the job still does not produce any output nor logs.
It must probably be because the file resulting from 'phyluce_probe_easy_lastz --identity 50 --coverage 50' is enormous (~88Gb).
Is there anything to do beside to wait?
To give a bit more context, I have been designing probes for our current project using 4 insect genomes (phylogenetically very distant), and am finding a really huge number of loci (but maybe this is normal): Loci shared by Base + 0 taxa: 1,884,627.0 Loci shared by Base + 1 taxa: 1,884,627.0 Loci shared by Base + 2 taxa: 424,868.0 Loci shared by Base + 3 taxa: 175,535.0
I opted for the 'Base + 3 taxa' set, used 180-bp for buffering and --tiling-density 2.
Thanks in advance for any input!