envmetagen / bastools

A bunch of scripts for processing read and primer data
GNU General Public License v3.0
2 stars 0 forks source link

parsing gb files - help with python #2

Closed bastianegeter closed 4 years ago

bastianegeter commented 5 years ago

Hi @nunofonseca. I am not familiar enough with python, was wondering if you can help with something?

I would like to pass a variable from R to a python script and to use an OR-type statement in said script. See the notes in script below.

I have a testing script for this here, which should run if you clone the repository, but currently I have a different python script for each variation of the call I want to make, which is far from ideal.

Might be easier to chat in person...

Thanks!

import sys
import re
from Bio import SeqIO
from Bio.SeqRecord import SeqRecord

genome=SeqIO.read(sys.argv[1], 'genbank')

l = []
n = 0
for record in list(SeqIO.parse(sys.argv[1], 'genbank')):
  name = record.name
for feature in record.features:
  if 'source' in feature.type:
  taxid=''.join(feature.qualifiers['db_xref'])
taxid=re.sub(r'.*taxon:','',taxid)
org=''.join(feature.qualifiers['organism'])
org=re.sub(r'.*organism=','',org)
for feat in genome.features:
  if feat.type == "rRNA":
  if '18S' in feat.qualifiers['product'][0]:      

  # here is where I need an OR statement: "18S" OR "small subunit ribosomal RNA" 
      # ideally these should be a character vector in R and fed to this script. 
      #There can be up to 6 variations to search for
  # furthermore, I would like to have an OR statement for feat.qualifiers: ['product'] OR ['note']

  start = feat.location.start.position
end = feat.location.end.position
pos = [start, end]
l.append(pos)
print '>' + name + ' organism=' + org + '; taxid=' + taxid + ";"

print feat.extract(genome.seq)
n = n + 1
bastianegeter commented 4 years ago

This was moved to another repository and has been solved

https://github.com/envmetagen/building_custom_refererence_dbs