HajkD / LTRpred

De novo annotation of young retrotransposons
https://hajkd.github.io/LTRpred/
GNU General Public License v2.0
45 stars 8 forks source link

"Error: Tibble columns must have compatible sizes" at Step 6 #24

Closed a7032018 closed 3 years ago

a7032018 commented 3 years ago

Hi, I am getting an error causing pre-maturation of the pipeline

Here is my R script


library(LTRpred) LTRpred(genome.file = "felv.fasta")


And the LTRPred log


$ Rscript test_simple.r Warning message: package ‘LTRpred’ was built under R version 4.0.3 vsearch v2.17.0_linux_x86_64, 1006.5GB RAM, 112 cores https://github.com/torognes/vsearch

Running LTRpred on genome 'felv.fasta' with 1 core(s) and searching for retrotransposons using the overlaps option (overlaps = 'no') ...

No hmm files were specified, thus the internal HMM library will be used! See '/home/khanlab/anaconda3/envs/RVDBAnnotation/lib/R/library/LTRpred/HMMs/hmm_*' for details. No tRNA files were specified, thus the internal tRNA library will be used! See '/home/khanlab/anaconda3/envs/RVDBAnnotation/lib/R/library/LTRpred/tRNAs/tRNA_library.fa' for details. The output folder '/home/khanlab/users/pei-ju.chin/projects/ltr_pred/FeLv_test/felv_ltrpred' does not seem to exist yet and will be created ...

LTRpred - Step 1: Run LTRharvest... LTRharvest: Generating index file felv_ltrharvest/felv_index.fsa with gt suffixerator... Running LTRharvest and writing results to felv_ltrharvest... LTRharvest analysis finished!

LTRpred - Step 2: Run LTRdigest... Generating index file felv_ltrdigest/felv_index_ltrdigest.fsa with suffixerator... LTRdigest: Sort index file... Running LTRdigest and writing results to felv_ltrdigest... LTRdigest analysis finished!

LTRpred - Step 3: Import LTRdigest Predictions...

Input: felv_ltrdigest/felv_LTRdigestPrediction.gff -> Row Number: 247 Remove 'NA' -> New Row Number: 247 (1/8) Filtering for repeat regions has been finished. (2/8) Filtering for LTR retrotransposons has been finished. (3/8) Filtering for inverted repeats has been finished. (4/8) Filtering for LTRs has been finished. (5/8) Filtering for target site duplication has been finished. (6/8) Filtering for primer binding site has been finished. (7/8) Filtering for protein match has been finished. (8/8) Filtering for RR tract has been finished.

LTRpred - Step 4: Perform ORF Prediction using 'usearch -fastx_findorfs' ... usearch v11.0.667_i86linux32, 4.0Gb RAM (1055Gb total), 112 cores (C) Copyright 2013-18 Robert C. Edgar, all rights reserved. https://drive5.com/usearch

License: personal use only

00:00 37Mb 100.0% Working

WARNING: Input has lower-case masked sequences

Join ORF Prediction table: nrow(df) = 52 candidates. unique(ID) = 22 candidates. unique(orf.id) = 42 candidates.

LTRpred - Step 5: Perform methylation context quantification.. Join methylation context (CG, CHG, CHH, CCG) count table: nrow(df) = 52 candidates. unique(ID) = 22 candidates. unique(orf.id) = 42 candidates. Copy files to result folder '/home/khanlab/users/pei-ju.chin/projects/ltr_pred/FeLv_test/felv_ltrpred'.

LTRpred - Step 6: Starting retrotransposon evolutionary age estimation by comparing the 3' and 5' LTRs using the molecular evolution model 'K80' and the mutation rate '1.3e-07' (please make sure the mutation rate can be assumed for your species of interest!) for 52 predicted elements ...

Please be aware that evolutionary age estimation based on 3' and 5' LTR comparisons are only very rough time estimates and don't take reverse-transcription mediated retrotransposon recombination between family members of retroelements into account! Please consult Sanchez et al., 2017 Nature Communications and Drost & Sanchez, 2019 Genome Biology and Evolution for more details on retrotransposon recombination. Error: Tibble columns must have compatible sizes.


I am not interested in evolutionary age estimation in Step 6. Is it possible to bypass this step?

Thanks!

HajkD commented 3 years ago

Hi,

Unfortunately, I cannot reproduce this error. Would it be possible to check whether it has to do with the fact that your output contains different numbers of unique elements 22 vs 42? Do you have duplicates in the output table?

unique(ID) = 22 candidates.
unique(orf.id) = 42 candidates.

In either case, I introduced a new argument ltr_age_estimation argument to LTRpred which you can now set to ltr_age_estimation = FALSE to skip this step. However, this feature won't be available on DockerHub for a bit since I want to make some additional maintenance changes to LTRpred before I re-submit to DockerHub.

I hope this helps?

Cheers, Hajk

HajkD commented 3 years ago

I assume this issue is resolved now.