Dfam-consortium / RepeatModeler

De-Novo Repeat Discovery Tool
Other
189 stars 22 forks source link

LTR Structural problem #129

Closed Koruomi closed 1 year ago

Koruomi commented 3 years ago

HI,I am really poor in program,and I get some trouble justnow. I can't get the answer from Google So,what does the following passage mean and how can I fix it? really thank you

LTR Structural Analysis

Running LtrHarvest... : 00:00:21 (hh:mm:ss) Elapsed Time Running Ltr_retriever... : 00:00:49 (hh:mm:ss) Elapsed Time Aligning instances.../opt/mafft/bin/mafft: line 2718: 21152 Killed "$prefix/disttbfast" -q $npickup -E $cycledisttbfast -V "-"$gopdist -s $unalignlevel $legacygapopt $mergearg -W $tuplesize $termgapopt $outnum $addarg $add2ndhalfarg -C $numthreads-$numthreadstb $memopt $weightopt $treeinopt $treeoutopt $distoutopt $seqtype $model -g $gexp -f "-"$gop -Q $spfactor -h $aof $param_fft $algopt $treealg $scoreoutarg $anchoropt -x $maxanchorseparation $oneiterationopt < infile > pre 2>> "$progressfile" LTRPipeline: Error - could not produce a multiple alignment from the denovo LTR results. LTRPipeline Time: 00:01:18 (hh:mm:ss) Elapsed Time

Koruomi commented 3 years ago

And I found that this kind of error seems to only occur when running a large sequence

AS you can see ,When running a 3M fasta file, it is good

LTR Structural Analysis

Running LtrHarvest... : 00:00:01 (hh:mm:ss) Elapsed Time Running Ltr_retriever... : 00:00:06 (hh:mm:ss) Elapsed Time Aligning instances... : 00:00:00 (hh:mm:ss) Elapsed Time Clustering... : 00:00:00 (hh:mm:ss) Elapsed Time Refining families... : 00:00:01 (hh:mm:ss) Elapsed Time Program Time: 00:00:08 (hh:mm:ss) Elapsed Time -- Clustering results with previous rounds...

jebrosen commented 3 years ago

/opt/mafft/bin/mafft: line 2718: 21152 Killed "$prefix/disttbfast" And I found that this kind of error seems to only occur when running a large sequence

We do try to use the --large option with MAFFT which is supposed to reduce RAM usage, so this might be a problem in MAFFT itself or in some of the other options we have used.

Or maybe this machine does not have enough RAM to run mafft on such a large input at once, even with --large. How much RAM do you have available on this machine?

Koruomi commented 3 years ago

/opt/mafft/bin/mafft: line 2718: 21152 Killed "$prefix/disttbfast" And I found that this kind of error seems to only occur when running a large sequence

We do try to use the --large option with MAFFT which is supposed to reduce RAM usage, so this might be a problem in MAFFT itself or in some of the other options we have used.

Or maybe this machine does not have enough RAM to run mafft on such a large input at once, even with --large. How much RAM do you have available on this machine?

For some reason, I did not use our school’s supercomputer cluster, but rented Alibaba’s cloud server by myself. The RAM of this server was only 2G, and the annual rent was 320 USD/year. I think the RAM is probably too small. FrustratedFrustrated but thank you for your answer

Koruomi commented 3 years ago

/opt/mafft/bin/mafft: line 2718: 21152 Killed "$prefix/disttbfast" And I found that this kind of error seems to only occur when running a large sequence

We do try to use the --large option with MAFFT which is supposed to reduce RAM usage, so this might be a problem in MAFFT itself or in some of the other options we have used.

Or maybe this machine does not have enough RAM to run mafft on such a large input at once, even with --large. How much RAM do you have available on this machine?

I I repeated an experiment with my own computer(6cpu,16gb,RTX2060), and the experiment was successful, but the final shielding effect was different.<1%. maybe LTR Structural Analysis does not matter?

jebrosen commented 3 years ago

but the final shielding effect was different.<1%. maybe LTR Structural Analysis does not matter?

If you are only using RepeatModeler output for masking, the overall difference may be small.

One thing you can check is the families.fa files to see how many LTRs were found by each method. The ones found by -LTRStruct will be named starting with ltr-, and all results are classified by RepeatClassifier (in the format rnd-family...#Type/SubType). It may be the case that in this particular genome all of the ltr- families could also be found with the RECON and RepeatScout methods.

flaviabmedeiros commented 1 year ago

Hello! Thanks for the inputs @jebrosen! I had the same errors as reported here, but the program was able to identify LTR elements in the genome (after R Modeler I used RepeatMasker against the families file). I just checked my files and I have elements found by -ltr. But how can I be sure that my ouputs are trustwrothy, even with this LTR Structural Analysis error?