Closed SwiftSeal closed 5 months ago
Hey @SwiftSeal . If you'd like to run the divergence plotting scripts seperate;ly so we can bug fix them the scripts we incoorporated in can be found here: https://github.com/jamesdgalbraith/EarlGreyDivergenceCalc.
The Python and R packages used are all in the current EarlGrey conda environment so if it's loaded it should be fine. If you'd rather use a fresh conda environment I've listed all the required packages there.
In terms of the error, would you be able to share the .water
files currently in the ${OUTDIR}/${species}_RepeatLandscape/tmp/
folder? The error appears to be due to BioAlign trying to read alignments from EMBOSS water containing sequences of unequal length, which (normally) isn't possible for water to produce!
On the memory issue front, how many threads were you running EarlGrey with? If the issue is arising at this stage at your using a lot of threads this should be an easy fix on our end.
@jamesdgalbraith
Thanks for the quick response :)
I'm running that just now - will let you know how it goes! I was using 32 threads initially, I've scaled this back to 16 and gave it 2TB of mem for the hell of it, as it didn't seem to be impacting performance too much. For some reason it is now warning:
WARNING. chromosome (chr05) was not found in the FASTA file. Skipping.
WARNING. chromosome (chr12) was not found in the FASTA file. Skipping.
I've attached all the .water
files under that directory for the previous run below:
@SwiftSeal Thanks for that.
There may have been a bug fix in my the attempted bug fixes I uploaded to the other repo an hour ago. Fotunately the latest commit fixes them and should be able to ignore faulty water alignments. There don't appear to be any faulty water alignments in the files you sent through so I'm quite confused as to what caused the previous error!
I believe the newest error seems to be due to pybedtools being unable to find the scaffolds called chr05
and chr12
in the genome file.
Great that fix worked thank you! Aye I had a look at the water alignments but couldn't see that either... The latest run finished and still failed at that step, so seems to be consistent for this genome, but all good otherwise.
Hi @SwiftSeal ! Thanks for pointing this out. I let James know and seems like he has sorted it. I've added this patch for the next release which will probably go live sometime today!
Hi @TobyBaril
I'm experiencing issues with the repeat landscape figure not been generated, looks like it's due to the divergence calculation script? SLURM is also reporting OOM issues (192G peak mem on 600mb genome), not sure if that could be related? I'm fairly certain this was running correctly on the same genome a few months ago, but have updated since then. Is there a recommended method for rerunning this step of the pipeline specifically? Otherwise the pipeline is finishing correctly and outputs look sensible.
Thanks in advance!