carden24 / Bioinformatics_scripts

Scripts UBC
5 stars 11 forks source link

Precompile regex's (HMM.search.and.parse.and.extract.py) #1

Closed LeeBergstrand closed 10 years ago

LeeBergstrand commented 10 years ago

Since you are using the same regex's over and over you should precomplile these regexs at the top of the script using re.compile. This way the computer doesn't have to recompile the same regex over and over. Example Regex from you code:

hmmshortname = re.sub('[.](hmm)','',model, re.I) 
shortname = re.sub('[.](fasta$|fas$|faa$|fsa$|fa$)','',query, re.I)

Instead you should use:

hmmSuffixRegex = re.compile('[.](hmm)')
querySuffixREgex = re.compile('[.](hmm)')

at the top of the script and then inside each method:

hmmshortname = hmmSuffixRegex.sub(repl, query, count=0)
shortname = querySuffixREgex.sub(repl, query, count=0)

Reference:

https://docs.python.org/2/library/re.html#re.compile https://docs.python.org/2/library/re.html#re.RegexObject.sub

carden24 commented 10 years ago

they are part of an optional output E

From: Lee Bergstrand [mailto:notifications@github.com] Sent: Friday, April 11, 2014 7:12 PM To: carden24/Bioinformatic_scripts Subject: [Bioinformatic_scripts] Precompile regex's (HMM.search.and.parse.and.extract.py) (#1)

Since you are using the same regex's over and over you should precomplile these regexs at the top of the script using re.compile. This way the computer doesn't have to recompile the same regex over and over. Example Regex from you code:

hmmshortname = re.sub('.','',model, re.I)

shortname = re.sub('.','',query, re.I)

Instead you should use:

hmmSuffixRegex = re.compile('.')

querySuffixREgex = re.compile('.')

at the top of the script and then inside each method:

hmmshortname = hmmSuffixRegex.sub(repl, query, count=0)

shortname = querySuffixREgex.sub(repl, query, count=0)

Reference:

https://docs.python.org/2/library/re.html#re.compile https://docs.python.org/2/library/re.html#re.RegexObject.sub

— Reply to this email directly or view it on GitHubhttps://github.com/carden24/Bioinformatic_scripts/issues/1.

carden24 commented 10 years ago

The precompiled regex patterns are now part of the script. Thanks Mr Lee