WGLab / RepeatHMM

a hidden Markov model to infer simple repeats from genome sequences
Other
34 stars 14 forks source link

'No pa file' problem & 'qsub: script file 'smp' cannot be loaded - No such file or directory' #43

Open LiShuhang-gif opened 3 years ago

LiShuhang-gif commented 3 years ago

Hi, I'm using RepeatHMM-scan to scan a genome for potential repeat regions. I used the example you provided in another issue, and this is my script:

python /public/home/fan_lab/shali/repeatHMM/RepeatHMM/bin/repeatHMM.py Scan \
  --SplitAndReAlign 1 \
  --MinSup 3 \
  --UserDefinedUniqID WGSscan \
  --SeqTech "Nanopore" "--Patternfile" hg38.trf.bed --cluster 1 --envset repeathmmenv \
  --Onebamfile /public/home/fan_lab/shali/tandem_repeats/LCL6_minimap2_sorted_rmdup_p20.bam \
  --hgfile /public/home/fan_lab/shali/reference/hg38_22_XYM.fa --thread 50

However, I'm still having some problems. The first one is the 'No pa file' problem:

('No pa file ', '/public/home/fan_lab/shali/yes/envs/repeathmmenv/lib/python2.7/site-packages/RepeatHMM/reference_sts//hg38/hg38.predefined.pa')
The following options are used (included default):
       BWAMEMOptions    ( -k8 -W8 -r7 );
             CompRep    (0);
           MatchInfo    ([3, -2, -2, -15, -1]);
              MaxRep    (4000);
              MinSup    (3);
         Patternfile    (['hg38.trf.bed']);
          RepeatTime    (5);
             SeqTech    (Nanopore);
     SplitAndReAlign    (1);
          TRFOptions    (2_7_4_80_10_100);
   Tolerate_mismatch    (None);
   UserDefinedUniqID    (WGSscan);
               align    (align/);
           emissionm    (None);
                  hg    (hg38);
              hgfile    (/public/home/fan_lab/shali/reference/hg38_22_XYM.fa);
        hmm_del_rate    (0.05);
     hmm_insert_rate    (0.1);
        hmm_sub_rate    (0.05);
     isGapCorrection    (1);
       minRepBWTSize    (70);
         minTailSize    (70);
              outlog    (2);
   repeatFlankLength    (30);
          repeatName    (None);
 specifiedRepeatInfo    (///////);
      stsBasedFolder    (/public/home/fan_lab/shali/yes/envs/repeathmmenv/lib/python2.7/site-packages/RepeatHMM/reference_sts
/);
         transitionm    (None);

          Onebamfile    (/public/home/fan_lab/shali/tandem_repeats/LCL6_minimap2_sorted_rmdup_p20.bam);
      SepbamfileTemp    (None);
            StopFail    (0);
               align    (align/);
    analysis_file_id    (gmm_GapCorrection1_FlankLength30_SplitAndReAlign1_2_7_4_80_10_100_hg38_comp_WGSscan_Nanopore_I0.120_
D0.020_S0.020_sub);
            avergnum    (100);
             bamfile    (/public/home/fan_lab/shali/tandem_repeats/LCL6_minimap2_sorted_rmdup_p20.bam);
             cluster    (1);
       clusterOption    (qsub -V -cwd -pe smp 1 -l h_vmem=8G -e %s -o %s -N %s);
              envset    (repeathmmenv);
             max_len    (1000);
           outFolder    (align/);
       repeathmmPath    (RepeatHMM/bin/);
         scan_region    (None);
      unique_file_id    (gmm_GapCorrection1_FlankLength30_SplitAndReAlign1_2_7_4_80_10_100_hg38_comp_WGSscan_Nanopore_I0.120_
D0.020_S0.020_sub);

But actually, I did follow your advice in another issue to download a TRF bed file from UCSC genome browser, and then use it as input of RepeatHMM scan with the parameter "--Patternfile" hg38.trf.bed. So I wonder if this message made any difference to the results? By the way, I don't see any "Error" in the log file.

The second problem is shown below:

('submit job=', 'chr10_000000000', 'echo "source activate repeathmmenv && python RepeatHMM/bin/repeatHMM.py Scan --MinSup 3 -
-hgfile /public/home/fan_lab/shali/reference/hg38_22_XYM.fa --MaxRep 4000 --SeqTech Nanopore --SplitAndReAlign 1 --Onebamfile
 /public/home/fan_lab/shali/tandem_repeats/LCL6_minimap2_sorted_rmdup_p20.bam --UserDefinedUniqID WGSscanchr10_000000000 --Pa
tternfile align//chr10_000000000.bed "| qsub -V -cwd -pe smp 1 -l h_vmem=8G -e align//chr10_000000000.e -o align//chr10_00000
0000.e -N chr10_000000000', 'at', '2021/08/30 10:27:59', 'Current=0/3976', 'times:', 0, -1)
qsub: script file 'smp' cannot be loaded - No such file or directory
('submit job=', 'chr10_000000001', 'echo "source activate repeathmmenv && python RepeatHMM/bin/repeatHMM.py Scan --MinSup 3 -
-hgfile /public/home/fan_lab/shali/reference/hg38_22_XYM.fa --MaxRep 4000 --SeqTech Nanopore --SplitAndReAlign 1 --Onebamfile
 /public/home/fan_lab/shali/tandem_repeats/LCL6_minimap2_sorted_rmdup_p20.bam --UserDefinedUniqID WGSscanchr10_000000001 --Pa
tternfile align//chr10_000000001.bed "| qsub -V -cwd -pe smp 1 -l h_vmem=8G -e align//chr10_000000001.e -o align//chr10_00000
0001.e -N chr10_000000001', 'at', '2021/08/30 10:28:03', 'Current=1/3976', 'times:', 0, -1)
qsub: script file 'smp' cannot be loaded - No such file or directory

It looks like the program was trying to find a script file and didn't find it, and I also don't know whether that affects the results or not. Can you tell me why these two problems came up and whether it affected the results or not? Thanks a lot!

liuqianhn commented 3 years ago

@LiShuhang-gif

  1. If you see the output of job submission, it means that 'No pa file' is fine. I will try to remove this error when bed files are provided.
  2. There is no smp file. The program tries to run qsub commands. I will also revise this option in case you have different submission system. You can search "qsub -V -cwd -pe smp 1 -l h_vmem=8G" and revise it according to your job submission system. You can also set --cluster 0 to not use job submission system.