VDBWRAIR / bio_bits

Various bioinformatics scripts
GNU General Public License v2.0
3 stars 4 forks source link

Identifying Genes with degenerate bases #39

Closed averagehat closed 9 years ago

averagehat commented 9 years ago

As per discussion with @InaMBerry Given a FASTA file and either:

Output a full list of which genes degenerate bases occur in. I think we concluded that we would like an option to provide an annotation file as an option or provide the reference ID and fetch the annotation from GenBank.

Example Output: Gene1: R at position 300 Gene1: Y at position 500 Gene2: ....

or: Gene1 had 2 degenerate bases Gene2 had 1 degenerate bases

InaMBerry commented 9 years ago

Classification: UNCLASSIFIED Caveats: NONE

http://www.ncbi.nlm.nih.gov/nuccore/KJ189367.1

I think output should be in the table format, as Dereje will make it for the tool. So we have a table that shows pos# and its amino acids, and the gene name can be in a column of its own for that position. Thanks! ina

Dr. Irina Maljkovic Berry Lead Scientist, Bioinformatics CNTS Viral Diseases Branch Walter Reed Army Institute of Research Silver Spring, MD 20910 +1(301)319-2032

-----Original Message----- From: Mike Panciera [mailto:notifications@github.com] Sent: Wednesday, July 22, 2015 11:33 AM To: VDBWRAIR/bio_pieces Cc: Maljkovic Berry, Irina CTR USARMY MEDCOM WRAIR (US) Subject: [bio_pieces] Identifying Genes with degenerate bases (#39)

As per discussion with @InaMBerry https://github.com/InaMBerry Given a FASTA file and either:

Output a full list of which genes degenerate bases occur in. I think we concluded that we would like an option to provide an annotation file as an option or provide the reference ID and fetch the annotation from GenBank.

Example Output: Gene1: R at position 300 Gene1: Y at position 500 Gene2: ....

or: Gene1 had 2 degenerate bases Gene2 had 1 degenerate bases

— Reply to this email directly or view it on GitHub https://github.com/VDBWRAIR/bio_pieces/issues/39 . https://github.com/notifications/beacon/AKlTHMjVUVVdeo0zB5XM4NfoRWzl1YVQks5of680gaJpZM4FdrSi.gif

Classification: UNCLASSIFIED Caveats: NONE

averagehat commented 9 years ago

I have this working for GenBank, but I would need an example file to accept a file--or would it be the same format as genbank?

InaMBerry commented 9 years ago

Classification: UNCLASSIFIED Caveats: NONE

I am not sure what format that is but I think the simplest way to do it would be a tab or a comma delimited text file.

Dr. Irina Maljkovic Berry Lead Scientist, Bioinformatics CNTS Viral Diseases Branch Walter Reed Army Institute of Research Silver Spring, MD 20910 +1(301)319-2032

-----Original Message----- From: Mike Panciera [mailto:notifications@github.com] Sent: Wednesday, July 22, 2015 1:55 PM To: VDBWRAIR/bio_pieces Cc: Maljkovic Berry, Irina CTR USARMY MEDCOM WRAIR (US) Subject: Re: [bio_pieces] Identifying Genes with degenerate bases (#39)

I have this working for GenBank, but I would need an example file to accept a file--or would it be the same format as genbank?

— Reply to this email directly or view it on GitHub https://github.com/VDBWRAIR/bio_pieces/issues/39#issuecomment-123808145 . https://github.com/notifications/beacon/AKlTHOa3hvhKNE4ZiHuDSFj_NDB4bUH-ks5of9B4gaJpZM4FdrSi.gif

Classification: UNCLASSIFIED Caveats: NONE

averagehat commented 9 years ago

From the discussion I would suggest accepting three types of arguments for the gene information:

necrolyte2 commented 9 years ago

I would suggest making the local gene coordinates file be flexible and accept tab or comma separated. I think that is fairly easy to detect

InaMBerry commented 9 years ago

Classification: UNCLASSIFIED Caveats: NONE

If it is not too complicated to have all three options it is a good idea. Also, as Tyghe suggested, the local file should be flexible and accept comma or tab.

Thanks!

Dr. Irina Maljkovic Berry Lead Scientist, Bioinformatics CNTS Viral Diseases Branch Walter Reed Army Institute of Research Silver Spring, MD 20910 +1(301)319-2032

-----Original Message----- From: Mike Panciera [mailto:notifications@github.com] Sent: Thursday, July 23, 2015 9:33 AM To: VDBWRAIR/bio_pieces Cc: Maljkovic Berry, Irina CTR USARMY MEDCOM WRAIR (US) Subject: Re: [bio_pieces] Identifying Genes with degenerate bases (#39)

From the discussion I would suggest accepting three types of arguments for the gene information:

— Reply to this email directly or view it on GitHub https://github.com/VDBWRAIR/bio_pieces/issues/39#issuecomment-124101123 . https://github.com/notifications/beacon/AKlTHK_U5RKtSh0sfQ--LP0yiSu_1sS1ks5ogOR5gaJpZM4FdrSi.gif

Classification: UNCLASSIFIED Caveats: NONE

demis001 commented 9 years ago

We may only need, "Gene_Name Gene_start Gene_end ", for the purpose we discussed.

Dereje

averagehat commented 9 years ago

@demis001 The interface (function) for use in #24 is here

averagehat commented 9 years ago

this functionality is in https://github.com/VDBWRAIR/bio_pieces/blob/64c555f506a925bfb83259111136165a30001e9f/bio_pieces/degen.py