flass / genomescans

R module and script for the analysis of aligned biological sequences, mainly to perform genome-wide scan for the detection of hotspots of diversity, LD, etc.
GNU General Public License v3.0
5 stars 1 forks source link

Two errors #2

Open jamiethompson77 opened 4 years ago

jamiethompson77 commented 4 years ago

Hiya,

Me again. I've realigned my sequences to the reference genome so they are in order and re-ran, which appeared successful until I got these errors:

Error in log10(ldrollsub$compldfisub) : non-numeric argument to mathematical function In addition: Warning message: In mclapply(1:N, rollfun, mc.cores = multiproc, mc.preschedule = FALSE) : 2785 function calls resulted in an error

I'm not sure why this is or how I can go about solving it.

This was my command: ./genome-wide_localLD_scan.r -a '/home/ubuntu/genomescans/aligned.fas' -o '/home/ubuntu/genomescans/output2' --LD.metric='Fisher' -T 4 -r 'sequence1','sequence2' -f '/home/ubuntu/genomescans/sequence1.gff'. The gff does not contain sequences, only the table of CDS positions (produced by Prokka).

Best wishes, Jamie

flass commented 4 years ago

Hi,

It's hard to tell what is the error as it is suppressed by the parallel wrapper. can you please to repeat this with just one thread (i.e. -N 1)?

Also I'm not sure the annotation you provide is in the correct format. This program definitely does not parse GFF files; it expects a GenBank feature table, as used as input for Genbank sequence submissions through Sequin, see here: https://www.ncbi.nlm.nih.gov/projects/Sequin/table.html

jamiethompson77 commented 4 years ago

Hiya,

Thank you, I've done that.

My bad, I was a bit unsure of how exactly the feature table works because my alignment is about 2% longer than the reference in Genbank so start/end positions of genes would be different. I think I should alter this according to the aligned sequence?

Below is the error after using the (unedited) feature table from Genbank and one thread. Perhaps my genomes are not diverse enough to have enough observations for the Wilcoxon test?

Jamie

[1] compute 'lbial.ldr2' with metric 'Fisher', using 1 cores 100 %% 1.462035 hours [1] expand data into a matrix There were 50 or more warnings (use warnings() to see the first 50) [1] get local LD intensity [1] within windows of fixed physical size 3000bp to sample bialelic sites within windows at a fixed (maximum) density of 20/window [1] get site range for computing empirical null distribution 100 %% 5.030646 secs [1] 276474 sites Error in wilcox.test.default(-log10(bial.ldr2[bialrange, bialrange]), : not enough (finite) 'x' observations Calls: rollStats ... FUN -> measfun -> wilcox.test -> wilcox.test.default Execution halted

flass commented 4 years ago

Hi Jamie,

no need to edit the coordinates, that's the point of providing the reference genome aligned with the test genomes.

I think you're right, your genomes don't show enough diversity so there is a point in the genome scan where locally there are not enough sites present in the scanned window. you could try and increase the window size. maybe some of the early output files about bi-allelic SNP density would be helping in that regard...