WGLab / RepeatHMM

a hidden Markov model to infer simple repeats from genome sequences
Other
34 stars 14 forks source link

Error no information #17

Closed liutiming closed 5 years ago

liutiming commented 5 years ago

Afer I keyed in the following command:

python2.7 /mnt/c/np/software/RepeatHMM/bin/repeatHMM.py BAMinput --Onebamfile amp_combined_long.sorted.bam --repeat Name FMR1 -- hgfile /mnt/c/np/reference/grch38/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna

I got the following error:

No pa file reference_sts//hg38/hg38.predefined.pa The following options are used (included default): BWAMEMOptions ( -k8 -W8 -r7 ); CompRep (0); MatchInfo ([3, -2, -2, -15, -1]); MaxRep (4000); MinSup (5); Patternfile (None); RepeatTime (5); SeqTech (None); SplitAndReAlign (1); TRFOptions (2_7_4_80_10_100); Tolerate_mismatch (None); UserDefinedUniqID (None); align (align/); emissionm (None); hg (hg38); hgfile (/mnt/c/np/reference/grch38/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna); hmm_del_rate (0.02); hmm_insert_rate (0.12); hmm_sub_rate (0.02); isGapCorrection (1); minRepBWTSize (70); minTailSize (70); outlog (2); repeatFlankLength (30); repeatName (FMR1); specifiedRepeatInfo (///////); stsBasedFolder (reference_sts/); transitionm (None);

      Onebamfile    (amp_combined_long.sorted.bam);
  SepbamfileTemp    (None);
           align    (align/);
analysis_file_id    (_GapCorrection1_FlankLength30_SplitAndReAlign1_2_7_4_80_10_100_hg38_comp_I0.120_D0.020_S0.020);
         bamfile    (amp_combined_long.sorted.bam);
  unique_file_id    (.gmm_GapCorrection1_FlankLength30_SplitAndReAlign1_2_7_4_80_10_100_hg38_comp_I0.120_D0.020_S0.020);

('Error no information for 0\nError no information for 1\nError no information for 2\nError no information for 3\nError no information for 4\n', ['', '', '', '', '', '', ''], 'fmr1')

Can I ask how I can resolve the error, please?

liutiming commented 5 years ago

Just FYI, I am not sure if the error was due to the edits I have made to the MAKEFILE (shown in the pull request)...

liutiming commented 5 years ago

I think it might be a version issue. I will first try to create a virtual environment for python2.7 to re-install all the dependencies and update here.

Thanks!

liutiming commented 5 years ago

Updates:

I have created the py27 virtual environment, installed the dependencies, make the MAKEFILE and run the software with the following script:

python /mnt/c/np/software/RepeatHMM/bin/repeatHMM.py BAMinput --Onebamfile amp_combined_long.sorted.bam --repeatNam e FMR1 --hgfile /mnt/c/np/reference/grch38/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna --hg hg38

Still the same error appeared: ('Error no information for 0\nError no information for 1\nError no information for 2\nError no information for 3\nError no information for 4\n', ['', '', '', '', '', '', ''], 'fmr1')

liuqianhn commented 5 years ago

Hi @6timings , Thank you for being interested in our tool.

It seems that you did not run the command at the RepeatHMM-bin folder. Thus, you might need to provide the full path of the predefined repeat patterns: "--Patternfile $RepeatHMM-bin-folder$/reference_sts/hg38/hg38.predefined.pa". Please replace $RepeatHMM-bin-folder$ with "/mnt/c/np/software/RepeatHMM/bin/" or any parent directory of where "repeatHMM.py" is.

Feel free to let me know if you still have any issue.

liutiming commented 5 years ago

Great Thanks! I tried it and the programme is running now. However, there are many

Warning unknow CIGAR element

image

printed to the terminal. Is it normal or is there something else? I am running the programme in the RepeatHMM-bin folder this time.

liuqianhn commented 5 years ago

Hi @6timings , great to know the program is running.

But the warning would significantly affect your results. I have updated the scripts to fix the issues. Please download the updated version of RepeatHMM. Thank you.

liutiming commented 5 years ago

Thanks for the updates! The tool is running. However, when I run --repeatName all, there is no FMR1 repeats identified even though I can identify them with --repeatName` FMR1.

Also, there are many errors of this type when I run with --repeatName all even though I have indexed the bam and the genome file (with both samtools and bwa) [main_samview] region "X:148499606-148501692" specifies an unknown reference name. Continue anyway.

FYI, my command is: python repeatHMM.py BAMinput --Onebamfile /mnt/c/np/amp/results/last/amp_combined_long.sorted.bam --repeatName all --hg hg38 --hgfile /mnt/c/np/reference/grch38/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna

`Please let me know if any more info is needed

liuqianhn commented 5 years ago

Hi @6timings , it seems that the chromosome names are different: the bam is using "1", "2", ... "X" etc for chorosome, while the default might be "chr1", "chr2" and so on. If the error is not this case, I might need more information to find out what is the issue.

liutiming commented 5 years ago

Hi @liuqianhn, thanks for the reply. I used samtools view| head to inspect the bam file and found that the third column is actually chr1. Can I ask what info may be helpful for you, please?

liuqianhn commented 5 years ago

Hi @6timings , could you please send me your log file to liuqianhn@gmail.com? Thank you.

amrita1983 commented 5 years ago

Hi, I am facing the same issue error is coming like this sh: 1: trf: not found [E::bwa_idx_load_from_disk] fail to locate the index files [main_samview] region "chrX:146992569-146994628" specifies an unknown reference name. Continue anyway. [main_samview] region "X:146992569-146994628" specifies an unknown reference name. Continue anyway. The following options are used (included default): BWAMEMOptions ( -k8 -W8 -r7 ); CompRep (0); MatchInfo ([3, -2, -2, -15, -1]); MaxRep (4000); MinSup (5); Patternfile (None); RepeatTime (5); SeqTech (None); SplitAndReAlign (1); TRFOptions (2_7_4_80_10_100); Tolerate_mismatch (None); UserDefinedUniqID (None); align (align/); emissionm (None); hg (hg19); hgfile (/mnt/NGS/Human_Exome_hg19/hg19.fa); hmm_del_rate (0.02); hmm_insert_rate (0.12); hmm_sub_rate (0.02); isGapCorrection (1); minRepBWTSize (70); minTailSize (70); outlog (2); repeatFlankLength (30); repeatName (FMR1); specifiedRepeatInfo (///////); stsBasedFolder (reference_sts/); transitionm (None);

      Onebamfile    (Prajwal_Wagh_aligned.sorted.bam);
  SepbamfileTemp    (None);
           align    (align/);
analysis_file_id    (_GapCorrection1_FlankLength30_SplitAndReAlign1_2_7_4_80_10_100_hg19_comp_I0.120_D0.020_S0.020);
         bamfile    (Prajwal_Wagh_aligned.sorted.bam);
  unique_file_id    (.gmm_GapCorrection1_FlankLength30_SplitAndReAlign1_2_7_4_80_10_100_hg19_comp_I0.120_D0.020_S0.020);

p2sp end---running time0 mem74

  p2sp ['fmr1', 20.0, [0, 0], 'allocr:', 0, 15]

    p2sp

FMR1 0 0;

I am using the following command python2 repeatHMM.py BAMinput --Onebamfile Prajwal_Wagh_aligned.sorted.bam --hg hg19 --hgfile /mnt/NGS/Human_Exome_hg19/hg19.fa --repeatName FMR1;

not getting what is the problem, I need to run the program urgently for a project.

liuqianhn commented 5 years ago

Hi @amrita1983, there might be two issues in your case: (1) trf might not be available, and you might need to install the TRF tool (Tandem Repeat Finder (see https://bioconda.github.io/recipes/trf/README.html)), and (2) the reference genome "hg19.fa” and the "bam" file might not be indexed properly with bwa and samtools. Please correct me if I am wrong.

amrita1983 commented 5 years ago

Hello, that issue is resolved thanks, can you please help me to know whether this tool is able to detect FMR1 repeats, which is actually long repeat and I have WGS data for a patient with FMR1 repeats using RepeatHMM the count is coming as 30 which quite unlikely for the patient.

liuqianhn commented 5 years ago

Hi @amrita1983, RepeatHMM can detect FMR1 repeats but might rely on the long reads which fully cover FMR1 repeat. If no reads cover flanking regions of the repeat, the long repeats might be missing (We will improve this part, but the improvement is not available yet). Sorry for missing your question.

huangyuanf commented 3 years ago

hello ,when using the following command: /lustre/huangyf/software/Miniconda3/envs/repeathmmenv/bin/python "/lustre/huangyf/software/RepeatHMM/bin/repeatHMM.py" FASTQinput --fastq "/lustre/huangyf/ont.fastq.data/20190329-BNP0832-P4-A1.pass.fastq" --hgfile "/lustre/huangyf/genome/hg38/hg38.fa" --repeatName all --Patternfile "/lustre/huangyf/software/RepeatHMM/bin/reference_sts/hg38/hg38.predefined.pa"

i find erro information as following: ('Error no information for 0\nError no information for 1\nError no information for 2\nError no information for 3\nError no information for 4\n', ['', '', '', '', '', '', ''], 'all')

could you help me

liuqianhn commented 3 years ago

@huangyuanf Since it is a same issue, please I will reply it at issue #40.

huangyuanf commented 3 years ago

I have seen the question,but there is differnt.i installed repeatHMM under "/lustre/huangyf/software/RepeatHMM-2.0.3/",then fastq file is under "/lustre/huangyf/ont.fastq.data/20190329-BNP0832-P4-A1.pass.fastq", the reference under "/lustre/huangyf/genome/hg38/hg38.fa",   Patternfile located at "/lustre/huangyf/software/RepeatHMM-2.0.3/bin/reference_sts/hg38/hg38.predefined.pa" ,they were full path.

but when i  entered this command in the terminal:  python "/lustre/huangyf/software/RepeatHMM-2.0.3/bin/repeatHMM.py" FASTQinput --fastq "/lustre/huangyf/ont.fastq.data/20190329-BNP0832-P4-A1.pass.fastq" --hg hg38 --hgfile "/lustre/huangyf/genome/hg38/hg38.fa" --repeatName all --Patternfile "/lustre/huangyf/software/RepeatHMM-2.0.3/bin/reference_sts/hg38/hg38.predefined.pa"

i got a erro information: ('Error no information for 0\nError no information for 1\nError no information for 2\nError no information for 3\nError no information for 4\n', ['', '', '', '', '', '', ''], 'all')

------------------ 原始邮件 ------------------ 发件人: "WGLab/RepeatHMM" @.>; 发送时间: 2021年4月10日(星期六) 晚上9:14 @.>; @.**@.>; 主题: Re: [WGLab/RepeatHMM] Error no information (#17)

@huangyuanf Since it is a same issue, please I will reply it at issue #40.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

liuqianhn commented 3 years ago

@huangyuanf It seems that for FASTQinput, all is not supported. Could you please replace all after --repeatName with a specific name in /lustre/huangyf/software/RepeatHMM-2.0.3/bin/reference_sts/hg38/hg38.predefined.pa and see whether you will have the same error?

huangyuanf commented 3 years ago

i don't kownwhy?how to do it?please! thanks!

------------------ 原始邮件 ------------------ 发件人: "WGLab/RepeatHMM" @.>; 发送时间: 2021年4月10日(星期六) 晚上10:00 @.>; @.**@.>; 主题: Re: [WGLab/RepeatHMM] Error no information (#17)

@huangyuanf It seems that for FASTQinput, all is not supported. Could you please replace all after --repeatName with a specific name in /lustre/huangyf/software/RepeatHMM-2.0.3/bin/reference_sts/hg38/hg38.predefined.pa and see whether you will have the same error?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

huangyuanf commented 3 years ago

i try bam file ,having the same question

------------------ 原始邮件 ------------------ 发件人: "WGLab/RepeatHMM" @.>; 发送时间: 2021年4月10日(星期六) 晚上10:00 @.>; @.**@.>; 主题: Re: [WGLab/RepeatHMM] Error no information (#17)

@huangyuanf It seems that for FASTQinput, all is not supported. Could you please replace all after --repeatName with a specific name in /lustre/huangyf/software/RepeatHMM-2.0.3/bin/reference_sts/hg38/hg38.predefined.pa and see whether you will have the same error?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

liuqianhn commented 3 years ago

@huangyuanf sorry for the late reply. I tried to re-produce the error, but I cannot not. Could you please show what is in the file /lustre/huangyf/software/RepeatHMM-2.0.3/bin/reference_sts/hg38/hg38.predefined.pa? Thanks.