HajkD / LTRpred

De novo annotation of young retrotransposons
https://hajkd.github.io/LTRpred/
GNU General Public License v2.0
45 stars 8 forks source link

Error in Join solo LTR Copy Number Estimation table #10

Open zyqzyqzyq opened 4 years ago

zyqzyqzyq commented 4 years ago

Hello, @HajkD, I keep getting this error in the Join solo LTR Copy Number Estimation table after Finished LTR CNV estimation

Filter hit results...
Estimate CNV for each LTR sequence...
Finished LTR CNV estimation!
Join solo LTR Copy Number Estimation table: nrow(df) = 8387 candidates.
unique(ID) = 8387 candidates.
unique(orf.id) = 8387 candidates.
Error: Column `cn_3ltr` must be length 8387 (the number of rows) or one, not 0
Stop executing 

Then , I checked the intermediate files ,find G_soloLTRs_3ltr.bed and G_solo_LTRs_5ltr.bed have a Slight difference (also find in your source code here)

# write estimated solo LTR loci to LTRpred output folder
            cn2bed(
                solo.ltr.cn$pred_3ltr,
                type = "solo",
                filename = paste0(chopped.foldername,"**_soloLTRs_3ltr**"),
                output = output.path
            )         

            cn2bed(
                solo.ltr.cn$pred_5ltr,
                type = "solo",
                filename = paste0(chopped.foldername,"**_solo_LTRs_5ltr**"),
                output = output.path
            )

So, if this code mistake result in the error ? thank you!

Best, zyq

HajkD commented 4 years ago

Hi @zyqzyqzyq

Thank you for contacting me.

The code you refer to isn't an error. It simply returns the solo-LTR copies found with BLAST: one file using the 3' LTR as BLAST query against the reference genome and one file using the 5' LTR as BLAST query against the reference genome (see Figure 1 for a scheme of 3' and 5' LTRs).

# write estimated solo LTR loci to LTRpred output folder
# this file output includes BLAST hits found with annotated 3' LTRs as BLAST query 
            cn2bed(
                solo.ltr.cn$pred_3ltr,
                type = "solo",
                filename = paste0(chopped.foldername,"**_soloLTRs_3ltr**"),
                output = output.path
            )  

# this file output includes BLAST hits found with annotated 5' LTRs as BLAST query 
            cn2bed(
                solo.ltr.cn$pred_5ltr,
                type = "solo",
                filename = paste0(chopped.foldername,"**_solo_LTRs_5ltr**"),
                output = output.path
            )

The sequences in both files differ, because 3' LTRs and 5' LTRs are not always 100% identical. This is quantified in the LTRpred output column ltr_similarity (see here).

So the error Error: Column cn_3ltr must be length 8387 (the number of rows) or one, not 0 must derive from another cause.

Could you please provide me with an example LTRpred command call in which this error occurs? This will allow me to further troubleshoot.

Many thanks!

zyqzyqzyq commented 4 years ago

Thank you for your answer, this is my LTRpred command call :

library(LTRpred) setwd = ("/CAT2/zhengyongqiang/escl/default-wenjianjia") LTRpred(
genome.file = "/CAT2/zhengyongqiang/escl/default-wenjianjia/aaa/G.arboreum_CRI-A2_assembly_v1.0.fasta", copy.number.est = TRUE, cores = 20 )

What result in the error Error: Column cn_3ltr must be length 8387 (the number of rows) or one, not 0 Thank you very much !

HajkD commented 4 years ago

Ok, I will have a look at this and see if I can reproduce this error.