Closed johannabeam closed 1 month ago
Hi,
There are some strange parsing issues sometimes happening depending on python versions at the moment (there have been lots of changes in pandas and numpy recently that have changed things). In this case, you should be able to simply rerun the last step of the pipeline by resubmitting exactly the same command as you did the first time round, and Earl Grey will skip all the steps where the outputs were successfully created, so it your case it should just regenerate plots. @jamesdgalbraith, I guess there might be an issue with dividing by 0 but this could be related to the input not being parsed properly??
Regarding runtime, on large genomes more threads should help if they are available. The runtime is limited by the repeat content of a given genome (which increases as genome size does), and is a result of the RepeatModeler run generally. On genomes >1GB, these can take upwards of a week on less than 16 threads.
I hope this helps!
Hi, I was able to re-run the last step but it gave me the same errors as before. Interestingly, with the other genome I was running, the plots were made just fine. Would it be worth trying to update earl grey and rerunning it again?
I had another question regarding the figure output of the repeat landscape plot. This is with the genome that worked (Icteria virens, a songbird). Mine doesn't appear to have the same axis values as the example plot on the homepage (percentage of genome on the y and CpG adjusted on the x). It also seems odd that there is relatively little recent TE activity on the right hand side of the plot, if I am interpreting this correctly. Is there a good way to interpret relative time with the Kimura 2-parameter distance? But, I am very new to TE analysis so I may be misunderstanding something here as well!
Hi Johanna,
Good spot! I haven't updated the README plot in a while. We have moved away from the calculations done by RepeatMasker as these apply a CpG adjustment informed by the human genome, which wasn't particularly transferable or representative for other species. The latest version uses classic Kimura 2-parameter distance which should be more informative in a wider range of species.
You are correct in your interpretation, a Kimura distance of 0 shows that the TE is identical to the consensus sequence, and thus has likely inserted more recently, which ancient activity more to the left. There are of course some errors around divergence, because the consensus sequences are our best guess of the ancestral element sequence, but it has been made from copies in the genome, so could already look diverged. However, we can still get a good idea of relative TE activity within the species, and among species, but it is harder to date TE activity without some error.
It looks like there has been some recent LTR activity (green peak ~0.09), with a second more ancient burst of LINEs and LTRs ~0.20ish. This looks fairly typical, with bursts followed by suppression of TE acitivty. It would be cool to see what you find in your comparison species!
Regarding the other genome with the same errors, this is likely an issue with the calculation step for one or two entries - and an edge case that we hadn't yet tested for. @jamesdgalbraith is on holiday at the moment, but is going to look into this when he is back, we will keep you updated
This should be fixed in v4.4.0
. Let us know if it is still not working for you with the update!
Hi all,
I've been playing around with this lovely pipeline with some bird genomes. These have been taking quite a long time to run (over 300 hours), and only generated one plot before finishing up. I installed via mamba - version 4.2.4. My command was:
My summary files:
End of my log file:
This issue appears similar to #121 but I don't think mine is related to a FASTA index. Is there a way to re-run just the plots without rerunning the whole pipeline? For a larger bird genome (1.3GB) it is taking a very long time (<<< Done in 304:03:20.00 >>>) to complete the program. I currently have one other genome running that should be finishing up shortly so we will see if it has a similar issue or not! Thanks in advance!