WGLab / RepeatHMM

a hidden Markov model to infer simple repeats from genome sequences
Other
34 stars 14 forks source link

align error #8

Closed rrazaghi closed 6 years ago

rrazaghi commented 6 years ago

Hi, I am trying to test run repeatHMM on PRJNA379845 dataset and I am getting this error. Can you please advise me so that I can fix this issue. Thank you in advance.

(HMM) ro:~/RepeatHMM-master/bin$ python repeatHMM.py FASTQinput --fastq SRR5363334.fastq --repeatName HTT The following options are used (included default): BWAMEMOptions ( -k8 -W8 -r7 ); CompRep (0); MatchInfo ([3, -2, -2, -15, -1]); MaxRep (4000); MinSup (5); Patternfile (None); RepeatTime (5); SeqTech (None); SplitAndReAlign (1); TRFOptions (2_7_4_80_10_100); Tolerate_mismatch (None); UserDefinedUniqID (None); emissionm (None); hg (hg38); hgfile (hg38.fa); hmm_del_rate (0.02); hmm_insert_rate (0.12); hmm_sub_rate (0.02); isGapCorrection (1); minRepBWTSize (70); minTailSize (70); outlog (2); repeatFlankLength (30); repeatName (HTT); specifiedRepeatInfo (///////); stsBasedFolder (reference_sts/); transitionm (None);

analysis_file_id  (_GapCorrection1_FlankLength30_SplitAndReAlign1_2_7_4_80_10_100_hg38_comp_I0.120_D0.020_S0.020);
       fastafile  (SRR5363334.fastq);
  unique_file_id  (.gmm_GapCorrection1_FlankLength30_SplitAndReAlign1_2_7_4_80_10_100_hg38_comp_I0.120_D0.020_S0.020);

Traceback (most recent call last): File "repeatHMM.py", line 565, in args.func(args); File "repeatHMM.py", line 365, in FASTQinput summary[commonOptions['repeatName']] = myFASTQhandler.getSCA3forKnownGeneWithPartialRev(commonOptions, specifiedOptions) File "/home/ro/RepeatHMM-master/bin/scripts/myFASTQhandler.py", line 135, in getSCA3forKnownGeneWithPartialRev res = getSCA3ForGivenGene(commonOptions, specifiedOptions, moreOptions); File "/home/ro/RepeatHMM-master/bin/scripts/myFASTQhandler.py", line 55, in getSCA3ForGivenGene upstreamstr, repregion, downstreamstr = get3part(mgloc, gene_start_end, repeat_start_end, repeatName , unique_file_id, analysis_file_id, hgfile, specifiedOptions) File "/home/ro/RepeatHMM-master/bin/scripts/myFASTQhandler.py", line 24, in get3part predata, mfasta, sufdata = myBAMhandler.getGene(repeatName, mgloc[0], gene_start_end, unique_file_id, analysis_file_id, hgfn, 10, specifiedOptions) File "/home/ro/RepeatHMM-master/bin/scripts/myBAMhandler.py", line 234, in getGene alignfolder = specifiedOptions['align']; #'align/' KeyError: 'align'

liuqianhn commented 6 years ago

Hi @rrazaghi ,

The parameter issue has been fixed. Could you please update the tool? Feel free to let me know if you have any question to use the tool.

rrazaghi commented 6 years ago

Hi again, Thank you for updating and quick response. now I have a problem I believe with BWA MEM, do you happen to know where this problem originates from?

(HMM) ro:~/RepeatHMM-master/bin$ python repeatHMM.py FASTQinput --fastq SRR5363334.fastq --repeatName atn1 The following options are used (included default): BWAMEMOptions ( -k8 -W8 -r7 ); CompRep (0); MatchInfo ([3, -2, -2, -15, -1]); MaxRep (4000); MinSup (5); Patternfile (None); RepeatTime (5); SeqTech (None); SplitAndReAlign (1); TRFOptions (2_7_4_80_10_100); Tolerate_mismatch (None); UserDefinedUniqID (None); align (align/); emissionm (None); hg (hg38); hgfile (mhgversion//hg38.fa); hmm_del_rate (0.02); hmm_insert_rate (0.12); hmm_sub_rate (0.02); isGapCorrection (1); minRepBWTSize (70); minTailSize (70); outlog (2); repeatFlankLength (30); repeatName (atn1); specifiedRepeatInfo (///////); stsBasedFolder (reference_sts/); transitionm (None);

           align    (align/);
analysis_file_id    (_GapCorrection1_FlankLength30_SplitAndReAlign1_2_7_4_80_10_100_hg38_comp_I0.120_D0.020_S0.020);
       fastafile    (SRR5363334.fastq);
  unique_file_id    (.gmm_GapCorrection1_FlankLength30_SplitAndReAlign1_2_7_4_80_10_100_hg38_comp_I0.120_D0.020_S0.020);

[E::bwa_idx_load_from_disk] fail to locate the index files [main_samview] region "chr12:6935717-6937773" specifies an unknown reference name. Continue anyway. [main_samview] region "12:6935717-6937773" specifies an unknown reference name. Continue anyway. ERROR None detection (sp) atn1 ['chr12', 6936717, 6936773, 'CAG', '+10', '6-35:49-88', ''] p2sp end---running time0 mem61

for output

atn1

The result is in logfq/RepFQ_atn1.gmm_GapCorrection1_FlankLength30_SplitAndReAlign1_2_7_4_80_10_100_hg38_comp_I0.120_D0.020_S0.020.log

rrazaghi commented 6 years ago

another question: is option --repeatName all not functional at the moment?

Thank you

liuqianhn commented 6 years ago

Hi @rrazaghi

  1. May I know whether you have indexed the reference genome? such as "bwa index your-genome or samtools faidx your-genome"? I suggest you do both to void potential errors.

  2. "--repeatName": only those in "bin/reference_sts/hg/hg.predefined.pa" could be used now, here "*" could be 19 or 38. But you can append other repeat information with a name as you wish in the ".pa" file, or simply using '--UserDefinedRepeat' to specify the repeat which you are interested in.

liuqianhn commented 6 years ago

Hi @rrazaghi , I am going to close this issue. Please re-open it or create a new issue if you have issues to run RepeatHMM.