Dfam-consortium / RepeatModeler

De-Novo Repeat Discovery Tool
Other
182 stars 23 forks source link

0 clean candidates remained after cleaning up candidates with LTR_retriever #237

Closed Zoe133 closed 3 months ago

Zoe133 commented 3 months ago

hello, everyone, i met a question when I ran RepeatModeler. My code as follows:

BuildDatabase -name Arabidopsis_chr1 Arabidopsis_thaliana.TAIR10.dna.chromosome.1.fa nohup RepeatModeler -threads 36 -database Arabidopsis_chr1 -LTRStruct > out.log &

and I checked the LTR_retriever.log:

Parameters: -repeatmasker ~/program/RepeatMasker -blastplus ~/program/rmblast-2.14.0/bin -cdhit_path ~/program/cd-hit-v4.8.1-2019-0228 -trf_path ~/program/trf -genome seq.fa -inharvest ~/test/chr1/RM_3227528.MonMar110944112024/LTR_3329428.MonMar111057382024/raw-struct-results.txt -noanno -threads 36

Mon Mar 11 10:58:20 CST 2024 Dependency checking: All passed! Mon Mar 11 10:58:33 CST 2024 LTR_retriever is starting from the Init step. Mon Mar 11 10:58:33 CST 2024 Start to convert inputs... Total candidates: 439 Total uniq candidates: 439

Mon Mar 11 10:58:34 CST 2024 Module 1: Start to clean up candidates... Sequences with 10 missing bp or 0.8 missing data rate will be discarded. Sequences containing tandem repeats will be discarded.

    Usage: perl cleanup.pl -f sample.fa [options] > sample.cln.fa 
Options:
    -misschar   n   Define the letter representing unknown sequences; case insensitive; default: n
    -Nscreen    [0|1]   Enable (1) or disable (0) the -nc parameter; default: 1
    -nc     [int]   Ambuguous sequence len cutoff; discard the entire sequence if > this number; default: 0
    -nr     [0-1]   Ambuguous sequence percentage cutoff; discard the entire sequence if > this number; default: 1
    -minlen     [int]   Minimum sequence length filter after clean up; default: 100 (bp)
    -cleanN     [0|1]   Retain (0) or remove (1) the -misschar taget in output sequence; default: 0
    -trf        [0|1]   Enable (1) or disable (0) tandem repeat finder (trf); default: 1
    -trf_path   path    Path to the trf program

Mon Mar 11 10:58:34 CST 2024 0 clean candidates remained

cp: cannot stat 'seq.fa.retriever.scn.adj': No such file or directory Mon Mar 11 10:58:34 CST 2024 No LTR-RT was found in your data.

Mon Mar 11 10:58:34 CST 2024 All analyses were finished!

I don‘t know what the "seq.fa" in Parameters refer to, and why "0 clean candidates remained after cleaning up candidates". If anyone could share the reasons and solutions, I would appreciate it very much!

YingChen94 commented 3 months ago

I have encountered the same error. Have you figured it out yet?

Zoe133 commented 3 months ago

I have encountered the same error. Have you figured it out yet?

Not yet! It has troubled me for a few days.

YingChen94 commented 3 months ago

Have you tried manually running LTR_retriever yet? See the post: https://github.com/Dfam-consortium/RepeatModeler/issues/170 . If manually running still doesn't work, we can raise the question to LTR_retriever GitHub issue page (https://github.com/oushujun/LTR_retriever/issues). I am currently trying to address a different issue and thus haven't tried myself.

Zoe133 commented 3 months ago

Have you tried manually running LTR_retriever yet? See the post: #170 . If manually running still doesn't work, we can raise the question to LTR_retriever GitHub issue page (https://github.com/oushujun/LTR_retriever/issues). I am currently trying to address a different issue and thus haven't tried myself.

Thank you for your adivce, I have tried it manually before, the results was the same, i will report this problem on LTR_retriever GitHub issue page.

davidlougheed commented 3 months ago

I see there was already an issue, but they closed it... https://github.com/oushujun/LTR_retriever/issues/104

Zoe133 commented 3 months ago

Hey, everyone! I reported this problem to Robert Hubley by email, he informed me that LTR_retriever v2.9.5 should work. I tried it, it is true! if you encounter the same question, you can give it a try.

davidlougheed commented 3 months ago

@Zoe133 thanks for the information!

yzjyzjyzj commented 3 months ago

Hi, I think the problem is that the latest version of LTR_retriever can not handle the output from ltrharvest. The LTR_retriever expect the column "seqid" to appear in the input, while the output of ltrharvest only contains 11 columns, without the column for "seqid".

davidlougheed commented 3 months ago

good to know, odd that they'd change the behaviour in a patch version increment.

yzjyzjyzj commented 3 months ago

the bug probably comes from "bin/get_range.pl" in LTR_retriever