CSU-KangHu / HiTE

High-precision TE Annotator
GNU General Public License v3.0
46 stars 1 forks source link

Test on demo data #14

Closed pxxiao-hz closed 1 month ago

pxxiao-hz commented 1 month ago
2024-08-14 20:37:50,275 - main.py[line:406] - INFO: Start step0: Structural Based LTR Searching
2024-08-14 20:37:50,275 - main.py[line:419] - INFO: cd /home/pxxiao/tools/HiTE/HiTE/./module && python3 /home/pxxiao/tools/HiTE/HiTE/./module/judge_LTR_transposons.py  -g /home/pxxiao/tools/HiTE/HiTE/test/genome.fa --ltrharvest_home /home/pxxiao/tools/HiTE/HiTE/./bin/LTR_HARVEST_parallel --ltrfinder_home /home/pxxiao/tools/HiTE/HiTE/./bin/LTR_FINDER_parallel-master -t 40 --tmp_output_dir /home/pxxiao/tools/HiTE/HiTE/test --recover 0 --miu 1.3e-08 --use_NeuralTE 1 --is_wicker 0 --NeuralTE_home /home/pxxiao/tools/HiTE/HiTE/./bin/NeuralTE --TEClass_home /home/pxxiao/tools/HiTE/HiTE/./classification
2024-08-14 20:37:50,606 - /home/pxxiao/tools/HiTE/HiTE/./module/judge_LTR_transposons.py[line:153] - INFO: Start step0.1: Running LTR_harvest_parallel and LTR_finder_parallel
2024-08-14 20:37:50,606 - /home/pxxiao/tools/HiTE/HiTE/module/Util.py[line:561] - DEBUG: cd /home/pxxiao/tools/HiTE/HiTE/test && perl /home/pxxiao/tools/HiTE/HiTE/./bin/LTR_HARVEST_parallel/LTR_HARVEST_parallel -seq /home/pxxiao/tools/HiTE/HiTE/test/genome.rename.fa -threads 40 > /dev/null 2>&1
2024-08-14 20:38:03,075 - /home/pxxiao/tools/HiTE/HiTE/module/Util.py[line:569] - DEBUG: cd /home/pxxiao/tools/HiTE/HiTE/test && perl /home/pxxiao/tools/HiTE/HiTE/./bin/LTR_FINDER_parallel-master/LTR_FINDER_parallel -harvest_out -seq /home/pxxiao/tools/HiTE/HiTE/test/genome.rename.fa -threads 40 > /dev/null 2>&1
2024-08-14 20:38:27,830 - /home/pxxiao/tools/HiTE/HiTE/./module/judge_LTR_transposons.py[line:158] - INFO: Running time of step0.1: 37.22380 s
2024-08-14 20:38:27,835 - /home/pxxiao/tools/HiTE/HiTE/./module/judge_LTR_transposons.py[line:169] - INFO: Start step0.2: run LTR_retriever to get confident LTR
2024-08-14 20:38:27,835 - /home/pxxiao/tools/HiTE/HiTE/module/Util.py[line:682] - DEBUG: start LTR_retriever detection...
2024-08-14 20:38:27,835 - /home/pxxiao/tools/HiTE/HiTE/module/Util.py[line:685] - DEBUG: cd /home/pxxiao/tools/HiTE/HiTE/test && LTR_retriever -genome /home/pxxiao/tools/HiTE/HiTE/test/genome.rename.fa -inharvest
 /home/pxxiao/tools/HiTE/HiTE/test/genome_all.fa.rawLTR.scn -noanno -threads 40 -u 1.3e-08
2024-08-14 20:38:44,531 - /home/pxxiao/tools/HiTE/HiTE/module/Util.py[line:689] - DEBUG: LTR_retriever running time: 16.69532 s
2024-08-14 20:38:44,531 - /home/pxxiao/tools/HiTE/HiTE/./module/judge_LTR_transposons.py[line:173] - INFO: Running time of step0.2: 16.69624 s
2024-08-14 20:38:44,597 - /home/pxxiao/tools/HiTE/HiTE/./module/judge_LTR_transposons.py[line:224] - DEBUG: python /home/pxxiao/tools/HiTE/HiTE/./bin/NeuralTE/src/Classifier.py --data /home/pxxiao/tools/HiTE/HiTE
/test/intact_LTR.fa --use_TSD 0 --model_path /home/pxxiao/tools/HiTE/HiTE/./bin/NeuralTE/models/NeuralTE_model.h5 --outdir /home/pxxiao/tools/HiTE/HiTE/test/NeuralTE_LTR --thread 40 --is_wicker 0
Traceback (most recent call last):
  File "/home/pxxiao/tools/HiTE/HiTE/./module/judge_LTR_transposons.py", line 261, in <module>
    label = intact_LTR_labels[intact_ltr_name]
KeyError: 'chr_1:321982..333313'
2024-08-14 20:38:47,205 - main.py[line:423] - INFO: Running time of step0: 56.93035 s
... ...

Hi, I encountered an issue while running the test data with the following command:

python main.py --genome /home/pxxiao/tools/HiTE/HiTE/demo/genome.fa --outdir /home/pxxiao/tools/HiTE/HiTE/test/ --thread 40

Could you please help me understand what might be causing this problem? Thank you in advance!

CSU-KangHu commented 1 month ago

Hi @pxxiao-hz,

It seems that some LTRs might be missing after classifying full-length LTRs with NeuralTE.

Could you please check if the file /home/pxxiao/tools/HiTE/HiTE/test/NeuralTE_LTR/classified_TE.fa exists and whether chr_1:321982..333313 is included in the file?

Best regards,
Kang

pxxiao-hz commented 1 month ago

Yes, this file does not exist: /home/pxxiao/tools/HiTE/HiTE/test/NeuralTE_LTR/classified_TE.fa.

The current output files are:

genome.rename.fa.retriever.all.scn
genome.rename.fa.harvest.combine.scn
genome.rename.fa.finder.combine.scn
genome_all.fa.rawLTR.scn
genome.rename.fa.LTRlib.fa
genome.rename.fa.pass.list
confident_ltr_cut.fa
confident_other.fa
longest_repeats_0.fa
longest_repeats_0.flanked.fa
confident_tir_0.fa
confident_helitron_0.fa
confident_non_ltr_0.fa
confident_helitron.fa
confident_non_ltr.fa
confident_tir.fa
confident_TE.cons.fa
genome.rename.fa
chr_name.map
CSU-KangHu commented 1 month ago

Could you please try running the command with the --debug 1 parameter and then check if the file /home/pxxiao/tools/HiTE/HiTE/test/NeuralTE_LTR/classified_TE.fa exists? Also, please verify if chr_1:321982..333313 is included in the file.

pxxiao-hz commented 1 month ago

NeuralTE_LTR and NeuralTE_all are empty dir.

CSU-KangHu commented 1 month ago

It seems to be an issue with the NeuralTE classification not running properly. Please try running the command individually as follows:

python /home/pxxiao/tools/HiTE/HiTE/./bin/NeuralTE/src/Classifier.py \
    --data /home/pxxiao/tools/HiTE/HiTE/test/intact_LTR.fa \
    --use_TSD 0 \
    --model_path /home/pxxiao/tools/HiTE/HiTE/./bin/NeuralTE/models/NeuralTE_model.h5 \
    --outdir /home/pxxiao/tools/HiTE/HiTE/test/NeuralTE_LTR \
    --thread 40 \
    --is_wicker 0
pxxiao-hz commented 1 month ago

Python package version mismatch:

nohup: 忽略输入
Traceback (most recent call last):
  File "/home/pxxiao/tools/HiTE/HiTE/./bin/NeuralTE/src/Classifier.py", line 17, in <module>
    from DataProcessor import DataProcessor
  File "/home/pxxiao/tools/HiTE/HiTE/bin/NeuralTE/src/DataProcessor.py", line 14, in <module>
    from utils.data_util import read_fasta_v1, generate_TSD_info, generate_domain_info, generate_terminal
_info, \
  File "/home/pxxiao/tools/HiTE/HiTE/bin/NeuralTE/src/../utils/data_util.py", line 7, in <module>
    import pandas as pd
  File "/home/pxxiao/tools/Anaconda3/envs/HiTE/lib/python3.8/site-packages/pandas/__init__.py", line 22, in <module>
    from pandas.compat import is_numpy_dev as _is_numpy_dev  # pyright: ignore # noqa:F401
  File "/home/pxxiao/tools/Anaconda3/envs/HiTE/lib/python3.8/site-packages/pandas/compat/__init__.py", line 18, in <module>
    from pandas.compat.numpy import (
  File "/home/pxxiao/tools/Anaconda3/envs/HiTE/lib/python3.8/site-packages/pandas/compat/numpy/__init__.py", line 22, in <module>
    raise ImportError(
ImportError: this version of pandas is incompatible with numpy < 1.20.3
your numpy version is 1.19.5.
Please upgrade numpy to >= 1.20.3 to use this pandas version
Command exited with non-zero status 1
CSU-KangHu commented 1 month ago

I have updated the yml file to specify pandas version 1.4.4. You can downgrade pandas to version 1.4.4 using conda. conda install pandas=1.4.4

pxxiao-hz commented 1 month ago

Thank you, Kang. This bug has been fixed.