Maximum number of samples and Time to Process

syngnathid commented 2 years ago

Hello SRY team,

Thank you for creating an useful tool. I am currently trying to use it to assemble a better genome that includes sex chromosomes. I was curious about few things that I couldn't find in the readme/documentation. Could you share some insights please?

Could I use 100 samples (50 males and 50 females) when I run SRY program?
What is the relationship between the number of the samples, the size/quality of the genome and time taken to identify the Y chromosome reads?

Context: I have a genome of a non-model organism with about ~450 contigs of which 21 are chromosome level scaffolds. This genome is from a male sample and is a heterogametic system (XY). Ideally. I would like to identify and stitch the sex chromosomes if possible. The genome was assembled using Hi-Fi and Hi-C data. I have Whole Genome Resequencing data from Illumina at ~30X coverage for about 50 males and 50 females .

caaswxb commented 2 years ago

Hi, As the sample number increases, the speciality is up but slow when larger than ~5 in human (see the biorxiv paper), and the sensitivity is down yet. So for good result and shorter running time, I suggest 10 samples (10 M and 10 F). The script SRY sorts ONT or PB reads, without HiFi. You can stop the SRY when the SRY_kmer.txt is generated and then deliver it to the script SRY_contig. The HiFi sorting function will be added in SRY latter.

Johnsonzcode commented 3 months ago

Sorry to bother, but I ran with 8 males and 7 females with 80 cores, it's been 15 days. I want to speed up, is there any suggestions?

caaswxb commented 2 months ago

Sorry, I'm not very familiar with computers and can't provide assistance on speeding up your computation.

At 2024-06-04 09:43:40, "johnsonz" @.***> wrote:

Sorry to bother, but I ran with 8 males and 7 females with 80 cores, it's been 5 days. I want to speed up, is there any suggestions?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

caaswxb / SRY

Maximum number of samples and Time to Process #2