dib-lab / elvers

(formerly eelpond) an automated RNA-Seq workflow system
https://dib-lab.github.io/elvers/
Other
28 stars 3 forks source link

use dammit annotations for DE (collapse "genes" based on annots) #131

Open bluegenes opened 5 years ago

bluegenes commented 5 years ago

Options to choose which database to collapse by. Provide the output as an additional/alternate gene-trans-map for deseq2

johnsolk commented 5 years ago

choosing dammit annotations for each contig, filtering by custom database and annotation length:

gff_file = "yourfile.fasta.dammit.gff3"
annotations = GFF3Parser(filename=gff_file).read()
# keeps track of how long the annotation is
annotations["length"] = annotations["end"].subtract(annotations["start"], fill_value=0)
# make new table for each with seqid, Name, start, end, length 
annotations = annotations.loc[annotations['database'] == "Edit_Gadus_morhua.gadMor1.pep.all.fa"]
annotations = annotations.sort_values(by=['seqid','length'],ascending=False).drop_duplicates(subset='seqid')[['seqid', 'Name','start','end','length']]
annotations = annotations.rename(columns = {'Name':'Ensembl'}) 
print('ensembl annotations',annotations.shape)
new_file = annotations.dropna(axis=0,how='all')
new_file.head()
johnsolk commented 5 years ago

or (if no custom database), sorting by e-value and taking top entry:

gff_file = "yourfile.fasta.dammit.gff3"
annotations = GFF3Parser(filename=gff_file).read()
names = annotations.sort_values(by=['seqid', 'score'], ascending=True).query('score < 1e-05').drop_duplicates(subset='seqid')[['seqid', 'Name']]
new_file = names.dropna(axis=0,how='all')
new_file.head()
johnsolk commented 5 years ago

Hi Tessa, any update with incorporating this into a new gene_trans_map so DE genes output can be dammit names, not contigs?

prvasquez commented 4 years ago

While running this line annotations = GFF3Parser(filename=gff_file).read()

I get this error.

/Users/prvasquez/miniconda3/envs/dammit-env/lib/python3.7/site-packages/dammit/fileio/base.py:79: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.

To retain the current behavior and silence the warning, pass 'sort=True'.

  return pd.concat(self, ignore_index=True)

I was unable to find where in the line sort=True was supposed to go.

ctb commented 4 years ago

Hi Picasso,

this is in a package that elvers uses, and is just a warning - nothing needs to be done by you, tho. thanks!